Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/16684
Full metadata record
DC FieldValueLanguage
dc.contributor.authorRehman, Mohammad Zia Uren_US
dc.contributor.authorBansal, Shubhien_US
dc.contributor.authorKumar, Nagendraen_US
dc.date.accessioned2025-09-04T12:41:58Z-
dc.date.available2025-09-04T12:41:58Z-
dc.date.issued2026-
dc.identifier.citationRehman, M. Z. U., Raghuvanshi, D., Bansal, S., & Kumar, N. (2026). A multimodal–multitask framework with cross-modal relation and hierarchical interactive attention for semantic comprehension. Information Fusion, 126. https://doi.org/10.1016/j.inffus.2025.103628en_US
dc.identifier.issn1566-2535-
dc.identifier.otherEID(2-s2.0-105013882808)-
dc.identifier.urihttps://dx.doi.org/10.1016/j.inffus.2025.103628-
dc.identifier.urihttps://dspace.iiti.ac.in:8080/jspui/handle/123456789/16684-
dc.description.abstractA major challenge in multimodal learning is the presence of noise within individual modalities. This noise inherently affects the resulting multimodal representations, especially when these representations are obtained through explicit interactions between different modalities. Moreover, the multimodal fusion techniques while aiming to achieve a strong joint representation, can neglect valuable discriminative information within the individual modalities. To this end, we propose a Multimodal-Multitask framework with crOss-modal Relation and hIErarchical iNteractive aTtention (MM-ORIENT) that is effective for multiple tasks. The proposed approach acquires multimodal representations cross-modally without explicit interaction between different modalities, reducing the noise effect at the latent stage. To achieve this, we propose cross-modal relation graphs that reconstruct monomodal features to acquire multimodal representations. The features are reconstructed based on the node neighborhood, where the neighborhood is decided by the features of a different modality. We also propose Hierarchical Interactive Monomodal Attention (HIMA) to focus on pertinent information within a modality. While cross-modal relation graphs help comprehend high-order relationships between two modalities, HIMA helps in multitasking by learning discriminative features of individual modalities before late-fusing them. Finally, extensive experimental evaluation on three datasets demonstrates that the proposed approach effectively comprehends multimodal content for multiple tasks. The code is available in the GitHub repository. https://github.com/devraj-raghuvanshi/MM-ORIENT. © 2025 Elsevier B.V., All rights reserved.en_US
dc.language.isoenen_US
dc.publisherElsevier B.V.en_US
dc.sourceInformation Fusionen_US
dc.subjectCross-modal Learningen_US
dc.subjectGenerative Ai Augmentationen_US
dc.subjectHate Speech Detectionen_US
dc.subjectMultimodal–multitask Learningen_US
dc.subjectSemantic Comprehensionen_US
dc.subjectSentiment Analysisen_US
dc.subjectInteractive Computer Systemsen_US
dc.subjectLatent Semantic Analysisen_US
dc.subjectLearning Systemsen_US
dc.subjectModal Analysisen_US
dc.subjectMulti-task Learningen_US
dc.subjectMultitaskingen_US
dc.subjectSemanticsen_US
dc.subjectCross-modalen_US
dc.subjectCross-modal Learningen_US
dc.subjectGenerative Ai Augmentationen_US
dc.subjectHate Speech Detectionen_US
dc.subjectMulti-modalen_US
dc.subjectMultimodal–multitask Learningen_US
dc.subjectMultitask Learningen_US
dc.subjectSemantic Comprehensionen_US
dc.subjectSentiment Analysisen_US
dc.subjectSpeech Detectionen_US
dc.titleA multimodal–multitask framework with cross-modal relation and hierarchical interactive attention for semantic comprehensionen_US
dc.typeJournal Articleen_US
Appears in Collections:Department of Computer Science and Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: