DARE: Deceiving Audio–Visual speech Recognition model

Mishra, Saumya; Gupta, Anup Kumar; Gupta, Puneet

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/4804

Title:	DARE: Deceiving Audio–Visual speech Recognition model
Authors:	Mishra, Saumya Gupta, Anup Kumar Gupta, Puneet
Keywords:	Character recognition;Deep learning;Speech recognition;Adversarial attack;Audiovisual speech recognition;Biometric verification;Cross modality;Detection networks;Hearing impaired;Noisy environment;Recognition models;Speaker verification;Spoken words;Audition
Issue Date:	2021
Publisher:	Elsevier B.V.
Citation:	Mishra, S., Gupta, A. K., & Gupta, P. (2021). DARE: Deceiving Audio–Visual speech recognition model. Knowledge-Based Systems, 232 doi:10.1016/j.knosys.2021.107503
Abstract:	Audio–Visual speech recognition (AVSR) is an effective way to predict text corresponding to the spoken words using both audio and face videos, even in a noisy environment. These models find extensive applications in various fields like assisting hearing-impaired, biometric verification and speaker verification. Adversarial examples are created by adding imperceptible perturbations to the original input resulting in an incorrect classification by the deep learning models. Attacking an AVSR model is quite challenging, as both audio and visual modalities complement each other. Moreover, the correlation between audio and video features decreases while crafting an adversarial example, which can be used for detecting the adversarial example. We propose an end-to-end targeted attack, Deceiving Audio–visual speech Recognition model (DARE), which successfully performs an imperceptible adversarial attack while remaining undetected by the existing synchronisation-based detection network, SyncNet. To this end, we are the first to perform an adversarial attack that fools the AVSR model and SyncNet simultaneously. Experimental results on the publicly available dataset using state-of-the-art AVSR model reveal that the proposed attack can successfully deceive the AVSR model while remaining undetected. Furthermore, our DARE attack circumvents the well-known defences while maintaining a 100% targeted attack success rate. © 2021 Elsevier B.V.
URI:	https://doi.org/10.1016/j.knosys.2021.107503 https://dspace.iiti.ac.in/handle/123456789/4804
ISSN:	0950-7051
Type of Material:	Journal Article
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: