Dynamic Multimodal Machine Comprehension

Dynamic Multimodal Machine Comprehension
Slide Note
Embed
Share

This research paper presents a novel approach for machine comprehension by integrating audio-oriented multimodal input with dynamic inter- and intra-modality attention. The methodology includes techniques such as Multi-Head Attention and Multimodal Knowledge Distillation to bridge the gap between textual and audio domains, enabling the model to function effectively in both unimodal and multimodal scenarios.

  • Machine Comprehension
  • Multimodal
  • Audio-Oriented
  • Dynamic Attention
  • Multimodal Knowledge

Uploaded on Feb 22, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention Zhiqi Huang, Fenglin Liu, Xian Wu, Shen Ge, Helin Wang, Wei Fan, Yuexian Zou AAAI 2021

  2. . IntroductionMachine Comprehension Input: Passage(P), Question(Q), Candidate choices(Ccandidate) Output: Predicted Choices(Cpre) P Cpre Q Ccan Unimodal Comprehension

  3. . IntroductionMachine Comprehension Input: Passage(P), Question(Q), Candidate choices(Ccandidate) Output: Predicted Choices(Cpre) P Cpre Q Ccan Unimodal Comprehension

  4. . IntroductionMachine Comprehension Input: Passage(P), Question(Q), Candidate choices(Ccandidate) Output: Predicted Choices(Cpre) P Q Cpre Ccan Unimodal Comprehension

  5. . IntroductionMachine Comprehension Input: Passage(P), Question(Q), Candidate choices(Ccandidate) Output: Predicted Choices(Cpre) P Q Cpre Ccan Unimodal Comprehension

  6. . IntroductionAudio-Oriented Machine Comprehension Input: Audio(A), Passage(P), Question(Q), Candidate choices(Ccandidate) Output: Predicted Choices(Cpre) P Q A Cpre Ccan Multimodal Comprehension

  7. . IntroductionAudio-Oriented Machine Comprehension Challenge: 1.Bridge the gap between textual and the audio domains. 2.Enable the model to work in the unimodal scenarios.

  8. . Methodology Dynamic Inter- and Intra-modality Attention DIIA: Dynamic Inter- and Intra-modality Attention MKD: Multimodal Knowledge Distillation (MKD)

  9. . MethodologyMulti-Head Attention Multi-Head Attention: Feed-Forward Network:

  10. . MethodologyInter-modality Attention Audio Attention Passage Attention query key value

  11. . MethodologyIntra-modality Attention Intra Attention

  12. . MethodologyMultimodal Knowledge Distillation

  13. . Implementation

  14. . Experiments

  15. . Experiments 1. Passage attention focus time, noun and transitional words. 2. Audio can extract tone information.

  16. . Conclusion Take-home message Multiple loss and modality consistency Distill knowledge from multi-modalities Design of attention mechanism

More Related Content