CMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusion

Dewangan, Lipika; Kumar Maurya, Chandresh

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/18147

Full metadata record

DC Field	Value	Language
dc.contributor.author	Dewangan, Lipika	en_US
dc.contributor.author	Kumar Maurya, Chandresh	en_US
dc.date.accessioned	2026-05-14T12:28:14Z	-
dc.date.available	2026-05-14T12:28:14Z	-
dc.date.issued	2026	-
dc.identifier.citation	Dewangan, L., & Kumar Maurya, C. (2026). CMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusion. Expert Systems with Applications, 316. https://doi.org/10.1016/j.eswa.2026.131639	en_US
dc.identifier.issn	0957-4174	-
dc.identifier.other	EID(2-s2.0-105032726526)	-
dc.identifier.uri	https://dx.doi.org/10.1016/j.eswa.2026.131639	-
dc.identifier.uri	https://dspace.iiti.ac.in:8080/jspui/handle/123456789/18147	-
dc.description.abstract	Code-mixing, the blending of multiple languages within a single utterance, is common in multilingual communities and on social media. Accurately identifying fine-grained sentiment in such text is crucial for applications such as social media analytics and customer feedback mining. However, code-mixed analysis is challenging due to limited annotated data, noisy transliterations, and the absence of robust language-invariant representations. To address these challenges, we introduce bilingual code-mixed datasets for Aspect-Based Sentiment Analysis (ABSA) in four Indic languages (Odia, Bengali, Marathi, Hindi) mixed with English. The datasets are annotated with aspect terms, aspect categories, and sentiment polarities. We propose CMF_HIT, a hybrid framework that fuses contextual features from XLM-R, semantic cues from CNN-ELMO-TFIDF, and syntactic signals from a Hierarchical Transformer (HIT). Feature fusion is performed through a gated weighting mechanism combined with a decorrelation regularizer to limit redundancy beyond naive feature concatenation. To further improve segmentation, we design a Language-Aware Gradient-based Tokenizer (LAGT) with aspect-aware masking that reduces fragmentation of multi-word aspect expressions in noisy code-mixed inputs. CMF_HIT jointly addresses three ABSA subtasks: aspect term extraction (ATE), aspect category detection (ACD), and aspect polarity classification (APC). Across four Indic-English datasets, the model consistently outperforms state-of-the-art baselines. It achieves up to 9.3% improvement in ACD and 6.8% in ATE, with steady gains in APC. These results confirm that the tokenizer and architecture provide additive benefits, with the hybrid fusion framework remaining the dominant contributor to performance. © 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.	en_US
dc.language.iso	en	en_US
dc.publisher	Elsevier Ltd	en_US
dc.source	Expert Systems with Applications	en_US
dc.title	CMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusion	en_US
dc.type	Journal Article	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: