Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/17052
Title: Enhancing Hindi–English Direct Speech-to-Speech Translation with Clustering-Aided Cross-Contrastive Self-Supervised Speech Representation Learning
Authors: Maurya, Chandresh Kumar
Keywords: Cluster ensembling;Cross-contrastive speech representation learning;Direct speech-to-speech translation
Issue Date: 2025
Publisher: Springer
Citation: Gupta, M., Dutta, M., & Maurya, C. K. (2025). Enhancing Hindi–English Direct Speech-to-Speech Translation with Clustering-Aided Cross-Contrastive Self-Supervised Speech Representation Learning (Vol. 9, Issue 1). https://doi.org/10.1007/s41314-025-00078-1
Abstract: Direct speech-to-speech translation (S2ST) is an important tool for bridging communication gaps. Direct S2ST translates speech from one language to another without relying on intermediate text, making it particularly useful for languages primarily spoken rather than written. However, the performance of Direct S2ST models on low-resource languages remains limited due to the scarcity or complete absence of parallel speech data required for training. Pretraining and finetuning are widely used techniques to leverage unsupervised speech data to improve model performance. In this work, we employ a cluster-aided, cross-contrastive self-supervised learning (SSL)-based speech representation model as the pre-trained encoder, combined with a multilingual BART (mBART) decoder. The resulting finetuned model outperforms a baseline that uses a contrastive-loss-based SSL model as the encoder. The proposed models improve the BLEU score by 4.14% for Hindi→English and 8.2% for English→Hindi compared to their respective baseline models. To train the model for English-to-Hindi, we trained a unit-vocoder on speech quantized using ensemble clustering instead of standard clustering. The resulting unit-vocoder outperformed the one trained on speech quantized using standard k-means for all evaluation metrics. © 2025 Elsevier B.V., All rights reserved.
URI: https://dx.doi.org/10.1007/s41314-025-00078-1
https://dspace.iiti.ac.in:8080/jspui/handle/123456789/17052
Type of Material: Journal Article
Appears in Collections:Department of Computer Science and Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: