Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/13766
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPachori, Ram Bilasen_US
dc.date.accessioned2024-06-28T11:38:12Z-
dc.date.available2024-06-28T11:38:12Z-
dc.date.issued2024-
dc.identifier.citationRadha, K., Bansal, M., & Pachori, R. B. (2024). Automatic speaker and age identification of children from raw speech using sincNet over ERB scale. Speech Communication. Scopus. https://doi.org/10.1016/j.specom.2024.103069en_US
dc.identifier.issn0167-6393-
dc.identifier.otherEID(2-s2.0-85190133382)-
dc.identifier.urihttps://doi.org/10.1016/j.specom.2024.103069-
dc.identifier.urihttps://dspace.iiti.ac.in/handle/123456789/13766-
dc.description.abstractThis paper presents the newly developed non-native children's English speech (NNCES) corpus to reveal the findings of automatic speaker and age recognition from raw speech. Convolutional neural networks (CNN), which have the ability to learn low-level speech representations, can be fed directly with raw speech signals instead of using traditional hand-crafted features. Moreover, the filters that were learned using standard CNNs appeared to be noisy because they consider all elements of each filter. In contrast, sincNet can be able to generate more meaningful filters simply by replacing the first convolutional layer by a sinc-layer in standard CNNs. The low and high cutoff frequencies of the rectangular band-pass filter are the only parameters that can be learned in sincNet, which has the potential to extract significant speech cues from the speaker, such as pitch and formants. In this work, the sincNet model is significantly changed by switching from baseline Mel scale initializations to equivalent rectangular bandwidth (ERB) initializations, which has the added benefit of allocating additional filters in the lower region of the spectrum. Additionally, it needs to be highlighted that the novel sincNet model is well suited to identify the age of the children. The investigations on both read and spontaneous speech tasks in speaker identification, gender independent & dependent age-group identification of children outperform the baseline models with varying relative improvements in terms of accuracy. © 2024 Elsevier B.V.en_US
dc.language.isoenen_US
dc.publisherElsevier B.V.en_US
dc.sourceSpeech Communicationen_US
dc.subjectConvolutional neural networksen_US
dc.subjectGender-dependent age identificationen_US
dc.subjectGender-independent age identificationen_US
dc.subjectNon-native children's English speech corpusen_US
dc.subjectRaw speech signalsen_US
dc.subjectSincNeten_US
dc.subjectSpeaker identificationen_US
dc.titleAutomatic speaker and age identification of children from raw speech using sincNet over ERB scaleen_US
dc.typeJournal Articleen_US
Appears in Collections:Department of Electrical Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: