Speech and speaker recognition using raw waveform modeling for adult and children's speech: A comprehensive review

Pachori, Ram Bilas

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/13157

Full metadata record

DC Field	Value	Language
dc.contributor.author	Pachori, Ram Bilas	en_US
dc.date.accessioned	2024-01-31T10:50:36Z	-
dc.date.available	2024-01-31T10:50:36Z	-
dc.date.issued	2024	-
dc.identifier.citation	Radha, K., Bansal, M., & Pachori, R. B. (2024). Speech and speaker recognition using raw waveform modeling for adult and children’s speech: A comprehensive review. Engineering Applications of Artificial Intelligence. Scopus. https://doi.org/10.1016/j.engappai.2023.107661	en_US
dc.identifier.issn	0952-1976	-
dc.identifier.other	EID(2-s2.0-85181676399)	-
dc.identifier.uri	https://doi.org/10.1016/j.engappai.2023.107661	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/13157	-
dc.description.abstract	Conventionally, the extraction of hand-crafted acoustic features has been separated from the task of establishing robust machine-learning models in speech processing. The manual approach of feature engineering is both time-consuming and necessitates specialist knowledge, posing significant hindrances. Moreover, the resulting features may not be ideal for the desired application. The speech community has adopted raw waveform modeling to enhance performance. These techniques learn an optimized representation of the input automatically. With deep learning (DL) advancements, raw waveform modeling has become valuable for tasks like classification and prediction. The primary aim of this survey is to offer valuable insights and fills a gap in the existing literature by providing a comprehensive review of the state-of-the-art in speech and speaker recognition using raw waveform modeling for both adult and children's speech. The article covers papers from 2013–2023 and is the first to review both adult and children's databases. The article focuses on the advantages of raw waveform models. It presents essential concepts and techniques while discussing the challenges and limitations of using raw waveforms for speech and speaker recognition in both adult and children's speech. The article also evaluates recent progress in DL architectures such as SincNet, ResNet, and RawNet, and outlines future research directions in the field. © 2023	en_US
dc.language.iso	en	en_US
dc.publisher	Elsevier Ltd	en_US
dc.source	Engineering Applications of Artificial Intelligence	en_US
dc.subject	Adult speech	en_US
dc.subject	Automatic speaker recognition	en_US
dc.subject	Automatic speech recognition	en_US
dc.subject	Children's speech	en_US
dc.subject	Deep learning architectures	en_US
dc.subject	Raw waveforms modeling	en_US
dc.title	Speech and speaker recognition using raw waveform modeling for adult and children's speech: A comprehensive review	en_US
dc.type	Short Survey	en_US
Appears in Collections:	Department of Electrical Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: