Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/18147
| Title: | CMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusion |
| Authors: | Dewangan, Lipika Kumar Maurya, Chandresh |
| Issue Date: | 2026 |
| Publisher: | Elsevier Ltd |
| Citation: | Dewangan, L., & Kumar Maurya, C. (2026). CMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusion. Expert Systems with Applications, 316. https://doi.org/10.1016/j.eswa.2026.131639 |
| Abstract: | Code-mixing, the blending of multiple languages within a single utterance, is common in multilingual communities and on social media. Accurately identifying fine-grained sentiment in such text is crucial for applications such as social media analytics and customer feedback mining. However, code-mixed analysis is challenging due to limited annotated data, noisy transliterations, and the absence of robust language-invariant representations. To address these challenges, we introduce bilingual code-mixed datasets for Aspect-Based Sentiment Analysis (ABSA) in four Indic languages (Odia, Bengali, Marathi, Hindi) mixed with English. The datasets are annotated with aspect terms, aspect categories, and sentiment polarities. We propose CMF_HIT, a hybrid framework that fuses contextual features from XLM-R, semantic cues from CNN-ELMO-TFIDF, and syntactic signals from a Hierarchical Transformer (HIT). Feature fusion is performed through a gated weighting mechanism combined with a decorrelation regularizer to limit redundancy beyond naive feature concatenation. To further improve segmentation, we design a Language-Aware Gradient-based Tokenizer (LAGT) with aspect-aware masking that reduces fragmentation of multi-word aspect expressions in noisy code-mixed inputs. CMF_HIT jointly addresses three ABSA subtasks: aspect term extraction (ATE), aspect category detection (ACD), and aspect polarity classification (APC). Across four Indic-English datasets, the model consistently outperforms state-of-the-art baselines. It achieves up to 9.3% improvement in ACD and 6.8% in ATE, with steady gains in APC. These results confirm that the tokenizer and architecture provide additive benefits, with the hybrid fusion framework remaining the dominant contributor to performance. © 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies. |
| URI: | https://dx.doi.org/10.1016/j.eswa.2026.131639 https://dspace.iiti.ac.in:8080/jspui/handle/123456789/18147 |
| ISSN: | 0957-4174 |
| Type of Material: | Journal Article |
| Appears in Collections: | Department of Computer Science and Engineering |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: