TRIESTE: translation based defense for text classifiers

Gupta, Anup Kumar; Paliwal, Vardhan; Rastogi, Aryan; Gupta, Puneet

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/10613

Title:	TRIESTE: translation based defense for text classifiers
Authors:	Gupta, Anup Kumar Paliwal, Vardhan Rastogi, Aryan Gupta, Puneet
Keywords:	Network security;Sentiment analysis;Translation (languages);Adversarial attack;Adversarial defense;Language processing;Natural language processing;Natural languages;Source language;State of the art;Text classifiers;Transformer;Translation;Classification (of information)
Issue Date:	2022
Publisher:	Springer Science and Business Media Deutschland GmbH
Citation:	Gupta, A. K., Paliwal, V., Rastogi, A., & Gupta, P. (2022). TRIESTE: Translation based defense for text classifiers. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-022-03859-0
Abstract:	The field of natural language processing (NLP) has significantly evolved with the advent of state-of-the-art models. The discovery of these models has entirely revolutionised how NLP tasks such as machine translation, sentiment analysis and many others are performed. However, despite their high efficacy and meticulous performance, these models are prone to adversarial attacks. Adversarial attacks involve the introduction of perturbations imperceptible to humans, which can severely impact the model’s learning and prediction accuracy. Current defenses on text data include approaches such as spell-checking and adversarial training, which have their limitations against state-of-the-art adversarial attacks. This paper put forward an effective transformation-based defense, TRIESTE (TRanslatIon basEd defenSe for Text classifiErs). The proposed defense overcomes the shortcomings of existing defenses by translating the input text from the source language to a target language and again back to the source language before providing it to the text classifier. Translation ensures that the sentiment of the translated text is similar to that of the input text by taking the entire text into consideration, which leads to the removal of adversarial perturbations. Rigorous evaluation on publicly available datasets showcases that TRIESTE is successful against state-of-the-art attacks without a significant drop in the classifier accuracy. © 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
URI:	https://doi.org/10.1007/s12652-022-03859-0 https://dspace.iiti.ac.in/handle/123456789/10613
ISSN:	1868-5137
Type of Material:	Journal Article
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: