Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/18202
Title: Raw Waveform Modeling for Speech Emotion Recognition Across Diverse Datasets
Authors: Pachori, Ram Bilas
Issue Date: 2026
Publisher: Springer Nature
Citation: Bansal, M., Shukla, S., & Pachori, R. B. (2026). Raw Waveform Modeling for Speech Emotion Recognition Across Diverse Datasets. Circuits, Systems, and Signal Processing. https://doi.org/10.1007/s00034-026-03602-6
Abstract: Human speech is a fundamental mode of communication has been extensively studied in speech processing. Speech emotion recognition (SER), a subfield of artificial intelligence and signal processing, aims to automatically identify and interpret emotional states from spoken language. While prior research has predominantly focused on specific datasets and limited handcrafted features. This work introduces a dual modeling approach that integrates raw waveform analysis with mel-frequency cepstral coefficients (MFCC) layered convolutional neural network (CNN) architectures, enabling the capture of both temporal and perceptual features for robust SER. To address real-world challenges, datasets were augmented by adding noise, and experiments were conducted across four benchmark datasets to evaluate the model’s generalizability in noisy conditions. The proposed model achieved accuracies of 94% on Surrey audio-visual expressed emotion database (SAVEE), 91.25% on Ryerson audio-visual database of emotional speech and song (RAVDESS), 84.75% on Berlin database of emotional speech (EMODB), and 79% on interactive emotional dyadic motion capture (IEMOCAP) using raw speech in the first experiment. Performance was even better in the second experiment of augmented data with additive white Gaussian noise (AWGN) with 92.5% on EMODB, 92.75% on RAVDESS and 90.25% on IEMOCAP, with SAVEE performing at 85.5%. Experimental results demonstrate that the proposed hybrid approach outperforms conventional models, highlighting its potential for reliable emotion classification in both general and healthcare contexts. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2026.
URI: https://dx.doi.org/10.1007/s00034-026-03602-6
https://dspace.iiti.ac.in:8080/jspui/handle/123456789/18202
ISSN: 0278-081X
Type of Material: Journal Article
Appears in Collections:Department of Electrical Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: