Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/18147
Title: CMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusion
Authors: Dewangan, Lipika
Kumar Maurya, Chandresh
Issue Date: 2026
Publisher: Elsevier Ltd
Citation: Dewangan, L., & Kumar Maurya, C. (2026). CMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusion. Expert Systems with Applications, 316. https://doi.org/10.1016/j.eswa.2026.131639
Abstract: Code-mixing, the blending of multiple languages within a single utterance, is common in multilingual communities and on social media. Accurately identifying fine-grained sentiment in such text is crucial for applications such as social media analytics and customer feedback mining. However, code-mixed analysis is challenging due to limited annotated data, noisy transliterations, and the absence of robust language-invariant representations. To address these challenges, we introduce bilingual code-mixed datasets for Aspect-Based Sentiment Analysis (ABSA) in four Indic languages (Odia, Bengali, Marathi, Hindi) mixed with English. The datasets are annotated with aspect terms, aspect categories, and sentiment polarities. We propose CMF_HIT, a hybrid framework that fuses contextual features from XLM-R, semantic cues from CNN-ELMO-TFIDF, and syntactic signals from a Hierarchical Transformer (HIT). Feature fusion is performed through a gated weighting mechanism combined with a decorrelation regularizer to limit redundancy beyond naive feature concatenation. To further improve segmentation, we design a Language-Aware Gradient-based Tokenizer (LAGT) with aspect-aware masking that reduces fragmentation of multi-word aspect expressions in noisy code-mixed inputs. CMF_HIT jointly addresses three ABSA subtasks: aspect term extraction (ATE), aspect category detection (ACD), and aspect polarity classification (APC). Across four Indic-English datasets, the model consistently outperforms state-of-the-art baselines. It achieves up to 9.3% improvement in ACD and 6.8% in ATE, with steady gains in APC. These results confirm that the tokenizer and architecture provide additive benefits, with the hybrid fusion framework remaining the dominant contributor to performance. © 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
URI: https://dx.doi.org/10.1016/j.eswa.2026.131639
https://dspace.iiti.ac.in:8080/jspui/handle/123456789/18147
ISSN: 0957-4174
Type of Material: Journal Article
Appears in Collections:Department of Computer Science and Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: