RADIANCE: Reliable and interpretable depression detection from speech using transformer

Gupta, Anup Kumar; Dhamaniya, Ashutosh; Gupta, Puneet

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/15048

Full metadata record

DC Field	Value	Language
dc.contributor.author	Gupta, Anup Kumar	en_US
dc.contributor.author	Dhamaniya, Ashutosh	en_US
dc.contributor.author	Gupta, Puneet	en_US
dc.date.accessioned	2024-12-24T05:20:01Z	-
dc.date.available	2024-12-24T05:20:01Z	-
dc.date.issued	2024	-
dc.identifier.citation	Gupta, A. K., Dhamaniya, A., & Gupta, P. (2024). RADIANCE: Reliable and interpretable depression detection from speech using transformer. Computers in Biology and Medicine. Scopus. https://doi.org/10.1016/j.compbiomed.2024.109325	en_US
dc.identifier.issn	0010-4825	-
dc.identifier.other	EID(2-s2.0-85207802232)	-
dc.identifier.uri	https://doi.org/10.1016/j.compbiomed.2024.109325	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/15048	-
dc.description.abstract	Depression is a common but severe mental disorder that adversely impacts the ability of an individual to function normally in their day-to-day life. A majority of depressed individuals remain undiagnosed due to factors such as social stigma and a shortage of healthcare professionals. Consequently, several Machine Learning and Deep Learning (DL) models based on speech have been proposed for automatic depression detection, with the latter generally outperforming the former. However, DL models are blackbox and offer no transparency. In contrast, healthcare professionals prefer models that provide interpretability besides being accurate. In this direction, we propose a method RADIANCE (Reliable AnD InterpretAble depressioN deteCtion transformErs). RADIANCE incorporates a novel FilterBank VIsion Transformer (FBViT) network, which provides the symptoms of depression as interpretable features. Additionally, we employ a novel loss function that handles the class imbalance issue in the datasets. It also incorporates a penalty term that addresses the hierarchy of misclassification errors. We also propose a reliability predictor based on low-level descriptors that provides a reliability score to indicate the trustworthiness of the prediction by FBViT. Furthermore, in contrast to the conventional averaging and majority pooling, RADIANCE consolidates predictions from multiple clips of the input audio by intricately weighing each prediction based on its reliability score, ensuring a more accurate overall prediction. RADIANCE outperforms the state-of-the-art depression detection methods, achieving an accuracy of 89.36%, 80.36%, and 94.44% over the DAIC-WOZ, E-DAIC, and CMDC datasets, respectively. Further, RADIANCE achieves MAE scores of 3.27 and 5.04 on the DAIC-WOZ and E-DAIC datasets, respectively. © 2024 Elsevier Ltd	en_US
dc.language.iso	en	en_US
dc.publisher	Elsevier Ltd	en_US
dc.source	Computers in Biology and Medicine	en_US
dc.subject	Audio modality	en_US
dc.subject	Depression detection	en_US
dc.subject	Interpretability	en_US
dc.subject	Trustworthiness	en_US
dc.subject	Vision transformer	en_US
dc.title	RADIANCE: Reliable and interpretable depression detection from speech using transformer	en_US
dc.type	Journal Article	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: