Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/10388
Title: Estimation of speaker characteristics from speech signal
Authors: Gupta, Tarun
Singh, Ranveer [Guide]
Siong, Chng Eng [Guide]
Keywords: Computer Science and Engineering
Issue Date: 20-May-2022
Publisher: Department of Computer Science and Engineering, IIT Indore
Series/Report no.: BTP581;CSE 2022 GUP
Abstract: Estimating speaker attributes like age and height is a difficult task with sev eral applications in speech forensic analysis and potential applications in speaker verification and speaker adaptation techniques. We present a bi-encoder (Mixture of Experts (MoE) inspired) transformer mixture model for estimating speaker age and height in this project. For the extraction of specific male and female voice characteristic features, we suggest the use of two different transformer encoders, while making use of wav2vec 2.0 as a common feature extraction method. The bi encoder architecture is chosen due to the significant variances in male and female voice characteristics. This architecture increases the model’s generalizability by re ducing interference effects during the model training. We conduct our tests using the TIMIT corpus and find that our results on age estimation surpass the present state-of-the-art. For male and female age estimation, we obtain 5.54 years and 6.49 years as root mean squared error (RMSE), respectively. Further research into the relative impact of various phonetic sound kinds for speaker profiling reveals that vowel phonemes are the most distinctive for age estimate.
URI: https://dspace.iiti.ac.in/handle/123456789/10388
Type of Material: B.Tech Project
Appears in Collections:Department of Computer Science and Engineering_BTP

Files in This Item:
File Description SizeFormat 
BTP_581_Tarun_Gupta_180001059.pdf724.78 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: