Automatic speaker and age identification of children from raw speech using sincNet over ERB scale

Pachori, Ram Bilas

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/13766

Title:	Automatic speaker and age identification of children from raw speech using sincNet over ERB scale
Authors:	Pachori, Ram Bilas
Keywords:	Convolutional neural networks;Gender-dependent age identification;Gender-independent age identification;Non-native children's English speech corpus;Raw speech signals;SincNet;Speaker identification
Issue Date:	2024
Publisher:	Elsevier B.V.
Citation:	Radha, K., Bansal, M., & Pachori, R. B. (2024). Automatic speaker and age identification of children from raw speech using sincNet over ERB scale. Speech Communication. Scopus. https://doi.org/10.1016/j.specom.2024.103069
Abstract:	This paper presents the newly developed non-native children's English speech (NNCES) corpus to reveal the findings of automatic speaker and age recognition from raw speech. Convolutional neural networks (CNN), which have the ability to learn low-level speech representations, can be fed directly with raw speech signals instead of using traditional hand-crafted features. Moreover, the filters that were learned using standard CNNs appeared to be noisy because they consider all elements of each filter. In contrast, sincNet can be able to generate more meaningful filters simply by replacing the first convolutional layer by a sinc-layer in standard CNNs. The low and high cutoff frequencies of the rectangular band-pass filter are the only parameters that can be learned in sincNet, which has the potential to extract significant speech cues from the speaker, such as pitch and formants. In this work, the sincNet model is significantly changed by switching from baseline Mel scale initializations to equivalent rectangular bandwidth (ERB) initializations, which has the added benefit of allocating additional filters in the lower region of the spectrum. Additionally, it needs to be highlighted that the novel sincNet model is well suited to identify the age of the children. The investigations on both read and spontaneous speech tasks in speaker identification, gender independent & dependent age-group identification of children outperform the baseline models with varying relative improvements in terms of accuracy. © 2024 Elsevier B.V.
URI:	https://doi.org/10.1016/j.specom.2024.103069 https://dspace.iiti.ac.in/handle/123456789/13766
ISSN:	0167-6393
Type of Material:	Journal Article
Appears in Collections:	Department of Electrical Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: