Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/18147
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDewangan, Lipikaen_US
dc.contributor.authorKumar Maurya, Chandreshen_US
dc.date.accessioned2026-05-14T12:28:14Z-
dc.date.available2026-05-14T12:28:14Z-
dc.date.issued2026-
dc.identifier.citationDewangan, L., & Kumar Maurya, C. (2026). CMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusion. Expert Systems with Applications, 316. https://doi.org/10.1016/j.eswa.2026.131639en_US
dc.identifier.issn0957-4174-
dc.identifier.otherEID(2-s2.0-105032726526)-
dc.identifier.urihttps://dx.doi.org/10.1016/j.eswa.2026.131639-
dc.identifier.urihttps://dspace.iiti.ac.in:8080/jspui/handle/123456789/18147-
dc.description.abstractCode-mixing, the blending of multiple languages within a single utterance, is common in multilingual communities and on social media. Accurately identifying fine-grained sentiment in such text is crucial for applications such as social media analytics and customer feedback mining. However, code-mixed analysis is challenging due to limited annotated data, noisy transliterations, and the absence of robust language-invariant representations. To address these challenges, we introduce bilingual code-mixed datasets for Aspect-Based Sentiment Analysis (ABSA) in four Indic languages (Odia, Bengali, Marathi, Hindi) mixed with English. The datasets are annotated with aspect terms, aspect categories, and sentiment polarities. We propose CMF_HIT, a hybrid framework that fuses contextual features from XLM-R, semantic cues from CNN-ELMO-TFIDF, and syntactic signals from a Hierarchical Transformer (HIT). Feature fusion is performed through a gated weighting mechanism combined with a decorrelation regularizer to limit redundancy beyond naive feature concatenation. To further improve segmentation, we design a Language-Aware Gradient-based Tokenizer (LAGT) with aspect-aware masking that reduces fragmentation of multi-word aspect expressions in noisy code-mixed inputs. CMF_HIT jointly addresses three ABSA subtasks: aspect term extraction (ATE), aspect category detection (ACD), and aspect polarity classification (APC). Across four Indic-English datasets, the model consistently outperforms state-of-the-art baselines. It achieves up to 9.3% improvement in ACD and 6.8% in ATE, with steady gains in APC. These results confirm that the tokenizer and architecture provide additive benefits, with the hybrid fusion framework remaining the dominant contributor to performance. © 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.en_US
dc.language.isoenen_US
dc.publisherElsevier Ltden_US
dc.sourceExpert Systems with Applicationsen_US
dc.titleCMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusionen_US
dc.typeJournal Articleen_US
Appears in Collections:Department of Computer Science and Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: