Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/4804
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMishra, Saumyaen_US
dc.contributor.authorGupta, Anup Kumaren_US
dc.contributor.authorGupta, Puneeten_US
dc.date.accessioned2022-03-17T01:00:00Z-
dc.date.accessioned2022-03-17T15:35:33Z-
dc.date.available2022-03-17T01:00:00Z-
dc.date.available2022-03-17T15:35:33Z-
dc.date.issued2021-
dc.identifier.citationMishra, S., Gupta, A. K., & Gupta, P. (2021). DARE: Deceiving Audio–Visual speech recognition model. Knowledge-Based Systems, 232 doi:10.1016/j.knosys.2021.107503en_US
dc.identifier.issn0950-7051-
dc.identifier.otherEID(2-s2.0-85115893673)-
dc.identifier.urihttps://doi.org/10.1016/j.knosys.2021.107503-
dc.identifier.urihttps://dspace.iiti.ac.in/handle/123456789/4804-
dc.description.abstractAudio–Visual speech recognition (AVSR) is an effective way to predict text corresponding to the spoken words using both audio and face videos, even in a noisy environment. These models find extensive applications in various fields like assisting hearing-impaired, biometric verification and speaker verification. Adversarial examples are created by adding imperceptible perturbations to the original input resulting in an incorrect classification by the deep learning models. Attacking an AVSR model is quite challenging, as both audio and visual modalities complement each other. Moreover, the correlation between audio and video features decreases while crafting an adversarial example, which can be used for detecting the adversarial example. We propose an end-to-end targeted attack, Deceiving Audio–visual speech Recognition model (DARE), which successfully performs an imperceptible adversarial attack while remaining undetected by the existing synchronisation-based detection network, SyncNet. To this end, we are the first to perform an adversarial attack that fools the AVSR model and SyncNet simultaneously. Experimental results on the publicly available dataset using state-of-the-art AVSR model reveal that the proposed attack can successfully deceive the AVSR model while remaining undetected. Furthermore, our DARE attack circumvents the well-known defences while maintaining a 100% targeted attack success rate. © 2021 Elsevier B.V.en_US
dc.language.isoenen_US
dc.publisherElsevier B.V.en_US
dc.sourceKnowledge-Based Systemsen_US
dc.subjectCharacter recognitionen_US
dc.subjectDeep learningen_US
dc.subjectSpeech recognitionen_US
dc.subjectAdversarial attacken_US
dc.subjectAudiovisual speech recognitionen_US
dc.subjectBiometric verificationen_US
dc.subjectCross modalityen_US
dc.subjectDetection networksen_US
dc.subjectHearing impaireden_US
dc.subjectNoisy environmenten_US
dc.subjectRecognition modelsen_US
dc.subjectSpeaker verificationen_US
dc.subjectSpoken wordsen_US
dc.subjectAuditionen_US
dc.titleDARE: Deceiving Audio–Visual speech Recognition modelen_US
dc.typeJournal Articleen_US
Appears in Collections:Department of Computer Science and Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: