Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/17224
Title: Continual End-to-End Speech-to-Text translation using augmented bi-sampler
Authors: Sarkar, Balaram
Karande, Pranav
Malviya, Ankit
Maurya, Chandresh Kumar
Keywords: Continual learning;Speech to text translation
Issue Date: 2026
Publisher: Academic Press
Citation: Sarkar, B., Karande, P., Malviya, A., & Maurya, C. K. (2026). Continual End-to-End Speech-to-Text translation using augmented bi-sampler. Computer Speech and Language, 96. https://doi.org/10.1016/j.csl.2025.101885
Abstract: Speech-to-Text (ST) is the translation of speech in one language to text in another language. Earlier models for ST used a pipeline approach combining automatic speech recognition (ASR) and machine translation (MT). Such models suffer from cascade error propagation, high latency and memory consumption. Therefore, End-to-End (E2E) ST models were proposed. Adapting E2E ST models to new language pairs results in deterioration of performance on the previously trained language pairs. This phenomenon is called Catastrophic Forgetting (CF). Therefore, we need ST models that can learn continually. The present work proposes a novel continual learning (CL) framework for E2E ST tasks. The core idea behind our approach combines proportional-language sampling (PLS), random sampling (RS), and augmentation. RS helps in performing well on the current task by sampling aggressively from it. PLS is used to sample equal proportion from past task data but it may cause over-fitting. To mitigate that, a combined approach of PLS+RS is used, dubbed as continual bi-sampler (CBS). However, CBS still suffers from over-fitting due to repeated samples from the past tasks. Therefore, we apply various augmentation strategies combined with CBS which we call continual augmented bi-sampler (CABS). We perform experiments on 4 language pairs of MuST-C (One to Many) and mTEDx (Many to Many) datasets and achieve a gain of 68.38% and 41% respectively in the average BLEU score compared to baselines. CABS also mitigates the average forgetting by 82.2% in MuST-C dataset compared to the Gradient Episodic Memory (GEM) baseline. The results show that the proposed CL based E2E ST ensures knowledge retention across previously trained languages. To the best of our knowledge, E2E ST model has not been studied before in a CL setup. © 2025 Elsevier B.V., All rights reserved.
URI: https://dx.doi.org/10.1016/j.csl.2025.101885
https://dspace.iiti.ac.in:8080/jspui/handle/123456789/17224
ISSN: 0885-2308
1095-8363
Type of Material: Journal Article
Appears in Collections:Department of Computer Science and Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: