Dynamic Multimodal Machine Comprehension

Slide Note

This research paper presents a novel approach for machine comprehension by integrating audio-oriented multimodal input with dynamic inter- and intra-modality attention. The methodology includes techniques such as Multi-Head Attention and Multimodal Knowledge Distillation to bridge the gap between textual and audio domains, enabling the model to function effectively in both unimodal and multimodal scenarios.

ldav Follow

Uploaded on Feb 22, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention Zhiqi Huang, Fenglin Liu, Xian Wu, Shen Ge, Helin Wang, Wei Fan, Yuexian Zou AAAI 2021

. IntroductionMachine Comprehension Input: Passage(P), Question(Q), Candidate choices(Ccandidate) Output: Predicted Choices(Cpre) P Cpre Q Ccan Unimodal Comprehension

. IntroductionMachine Comprehension Input: Passage(P), Question(Q), Candidate choices(Ccandidate) Output: Predicted Choices(Cpre) P Cpre Q Ccan Unimodal Comprehension

. IntroductionMachine Comprehension Input: Passage(P), Question(Q), Candidate choices(Ccandidate) Output: Predicted Choices(Cpre) P Q Cpre Ccan Unimodal Comprehension

. IntroductionMachine Comprehension Input: Passage(P), Question(Q), Candidate choices(Ccandidate) Output: Predicted Choices(Cpre) P Q Cpre Ccan Unimodal Comprehension

. IntroductionAudio-Oriented Machine Comprehension Input: Audio(A), Passage(P), Question(Q), Candidate choices(Ccandidate) Output: Predicted Choices(Cpre) P Q A Cpre Ccan Multimodal Comprehension

. IntroductionAudio-Oriented Machine Comprehension Challenge: 1.Bridge the gap between textual and the audio domains. 2.Enable the model to work in the unimodal scenarios.

. Methodology Dynamic Inter- and Intra-modality Attention DIIA: Dynamic Inter- and Intra-modality Attention MKD: Multimodal Knowledge Distillation (MKD)