A multimodal–multitask framework with cross-modal relation and hierarchical interactive attention for semantic comprehension

Rehman, Mohammad Zia Ur; Bansal, Shubhi; Kumar, Nagendra

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/16684

Full metadata record

DC Field	Value	Language
dc.contributor.author	Rehman, Mohammad Zia Ur	en_US
dc.contributor.author	Bansal, Shubhi	en_US
dc.contributor.author	Kumar, Nagendra	en_US
dc.date.accessioned	2025-09-04T12:41:58Z	-
dc.date.available	2025-09-04T12:41:58Z	-
dc.date.issued	2026	-
dc.identifier.citation	Rehman, M. Z. U., Raghuvanshi, D., Bansal, S., & Kumar, N. (2026). A multimodal–multitask framework with cross-modal relation and hierarchical interactive attention for semantic comprehension. Information Fusion, 126. https://doi.org/10.1016/j.inffus.2025.103628	en_US
dc.identifier.issn	1566-2535	-
dc.identifier.other	EID(2-s2.0-105013882808)	-
dc.identifier.uri	https://dx.doi.org/10.1016/j.inffus.2025.103628	-
dc.identifier.uri	https://dspace.iiti.ac.in:8080/jspui/handle/123456789/16684	-
dc.description.abstract	A major challenge in multimodal learning is the presence of noise within individual modalities. This noise inherently affects the resulting multimodal representations, especially when these representations are obtained through explicit interactions between different modalities. Moreover, the multimodal fusion techniques while aiming to achieve a strong joint representation, can neglect valuable discriminative information within the individual modalities. To this end, we propose a Multimodal-Multitask framework with crOss-modal Relation and hIErarchical iNteractive aTtention (MM-ORIENT) that is effective for multiple tasks. The proposed approach acquires multimodal representations cross-modally without explicit interaction between different modalities, reducing the noise effect at the latent stage. To achieve this, we propose cross-modal relation graphs that reconstruct monomodal features to acquire multimodal representations. The features are reconstructed based on the node neighborhood, where the neighborhood is decided by the features of a different modality. We also propose Hierarchical Interactive Monomodal Attention (HIMA) to focus on pertinent information within a modality. While cross-modal relation graphs help comprehend high-order relationships between two modalities, HIMA helps in multitasking by learning discriminative features of individual modalities before late-fusing them. Finally, extensive experimental evaluation on three datasets demonstrates that the proposed approach effectively comprehends multimodal content for multiple tasks. The code is available in the GitHub repository. https://github.com/devraj-raghuvanshi/MM-ORIENT. © 2025 Elsevier B.V., All rights reserved.	en_US
dc.language.iso	en	en_US
dc.publisher	Elsevier B.V.	en_US
dc.source	Information Fusion	en_US
dc.subject	Cross-modal Learning	en_US
dc.subject	Generative Ai Augmentation	en_US
dc.subject	Hate Speech Detection	en_US
dc.subject	Multimodal–multitask Learning	en_US
dc.subject	Semantic Comprehension	en_US
dc.subject	Sentiment Analysis	en_US
dc.subject	Interactive Computer Systems	en_US
dc.subject	Latent Semantic Analysis	en_US
dc.subject	Learning Systems	en_US
dc.subject	Modal Analysis	en_US
dc.subject	Multi-task Learning	en_US
dc.subject	Multitasking	en_US
dc.subject	Semantics	en_US
dc.subject	Cross-modal	en_US
dc.subject	Cross-modal Learning	en_US
dc.subject	Generative Ai Augmentation	en_US
dc.subject	Hate Speech Detection	en_US
dc.subject	Multi-modal	en_US
dc.subject	Multimodal–multitask Learning	en_US
dc.subject	Multitask Learning	en_US
dc.subject	Semantic Comprehension	en_US
dc.subject	Sentiment Analysis	en_US
dc.subject	Speech Detection	en_US
dc.title	A multimodal–multitask framework with cross-modal relation and hierarchical interactive attention for semantic comprehension	en_US
dc.type	Journal Article	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: