Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/18147
Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Dewangan, Lipika | en_US |
| dc.contributor.author | Kumar Maurya, Chandresh | en_US |
| dc.date.accessioned | 2026-05-14T12:28:14Z | - |
| dc.date.available | 2026-05-14T12:28:14Z | - |
| dc.date.issued | 2026 | - |
| dc.identifier.citation | Dewangan, L., & Kumar Maurya, C. (2026). CMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusion. Expert Systems with Applications, 316. https://doi.org/10.1016/j.eswa.2026.131639 | en_US |
| dc.identifier.issn | 0957-4174 | - |
| dc.identifier.other | EID(2-s2.0-105032726526) | - |
| dc.identifier.uri | https://dx.doi.org/10.1016/j.eswa.2026.131639 | - |
| dc.identifier.uri | https://dspace.iiti.ac.in:8080/jspui/handle/123456789/18147 | - |
| dc.description.abstract | Code-mixing, the blending of multiple languages within a single utterance, is common in multilingual communities and on social media. Accurately identifying fine-grained sentiment in such text is crucial for applications such as social media analytics and customer feedback mining. However, code-mixed analysis is challenging due to limited annotated data, noisy transliterations, and the absence of robust language-invariant representations. To address these challenges, we introduce bilingual code-mixed datasets for Aspect-Based Sentiment Analysis (ABSA) in four Indic languages (Odia, Bengali, Marathi, Hindi) mixed with English. The datasets are annotated with aspect terms, aspect categories, and sentiment polarities. We propose CMF_HIT, a hybrid framework that fuses contextual features from XLM-R, semantic cues from CNN-ELMO-TFIDF, and syntactic signals from a Hierarchical Transformer (HIT). Feature fusion is performed through a gated weighting mechanism combined with a decorrelation regularizer to limit redundancy beyond naive feature concatenation. To further improve segmentation, we design a Language-Aware Gradient-based Tokenizer (LAGT) with aspect-aware masking that reduces fragmentation of multi-word aspect expressions in noisy code-mixed inputs. CMF_HIT jointly addresses three ABSA subtasks: aspect term extraction (ATE), aspect category detection (ACD), and aspect polarity classification (APC). Across four Indic-English datasets, the model consistently outperforms state-of-the-art baselines. It achieves up to 9.3% improvement in ACD and 6.8% in ATE, with steady gains in APC. These results confirm that the tokenizer and architecture provide additive benefits, with the hybrid fusion framework remaining the dominant contributor to performance. © 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Elsevier Ltd | en_US |
| dc.source | Expert Systems with Applications | en_US |
| dc.title | CMF_Hit: Enhancing code-Mixed aspect-Based sentiment analysis via language-Aware gradient-Based tokenization and feature fusion | en_US |
| dc.type | Journal Article | en_US |
| Appears in Collections: | Department of Computer Science and Engineering | |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: