RADIANCE: Reliable and interpretable depression detection from speech using transformer

Gupta, Anup Kumar; Dhamaniya, Ashutosh; Gupta, Puneet

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/15048

Title:	RADIANCE: Reliable and interpretable depression detection from speech using transformer
Authors:	Gupta, Anup Kumar Dhamaniya, Ashutosh Gupta, Puneet
Keywords:	Audio modality;Depression detection;Interpretability;Trustworthiness;Vision transformer
Issue Date:	2024
Publisher:	Elsevier Ltd
Citation:	Gupta, A. K., Dhamaniya, A., & Gupta, P. (2024). RADIANCE: Reliable and interpretable depression detection from speech using transformer. Computers in Biology and Medicine. Scopus. https://doi.org/10.1016/j.compbiomed.2024.109325
Abstract:	Depression is a common but severe mental disorder that adversely impacts the ability of an individual to function normally in their day-to-day life. A majority of depressed individuals remain undiagnosed due to factors such as social stigma and a shortage of healthcare professionals. Consequently, several Machine Learning and Deep Learning (DL) models based on speech have been proposed for automatic depression detection, with the latter generally outperforming the former. However, DL models are blackbox and offer no transparency. In contrast, healthcare professionals prefer models that provide interpretability besides being accurate. In this direction, we propose a method RADIANCE (Reliable AnD InterpretAble depressioN deteCtion transformErs). RADIANCE incorporates a novel FilterBank VIsion Transformer (FBViT) network, which provides the symptoms of depression as interpretable features. Additionally, we employ a novel loss function that handles the class imbalance issue in the datasets. It also incorporates a penalty term that addresses the hierarchy of misclassification errors. We also propose a reliability predictor based on low-level descriptors that provides a reliability score to indicate the trustworthiness of the prediction by FBViT. Furthermore, in contrast to the conventional averaging and majority pooling, RADIANCE consolidates predictions from multiple clips of the input audio by intricately weighing each prediction based on its reliability score, ensuring a more accurate overall prediction. RADIANCE outperforms the state-of-the-art depression detection methods, achieving an accuracy of 89.36%, 80.36%, and 94.44% over the DAIC-WOZ, E-DAIC, and CMDC datasets, respectively. Further, RADIANCE achieves MAE scores of 3.27 and 5.04 on the DAIC-WOZ and E-DAIC datasets, respectively. © 2024 Elsevier Ltd
URI:	https://doi.org/10.1016/j.compbiomed.2024.109325 https://dspace.iiti.ac.in/handle/123456789/15048
ISSN:	0010-4825
Type of Material:	Journal Article
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: