Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/14221
Full metadata record
DC FieldValueLanguage
dc.contributor.authorSethiya, Niveditaen_US
dc.contributor.authorMaurya, Chandresh Kumaren_US
dc.date.accessioned2024-08-14T10:23:44Z-
dc.date.available2024-08-14T10:23:44Z-
dc.date.issued2024-
dc.identifier.citationSethiya, N., Nair, S., & Maurya, C. K. (2024). Indic-TEDST: Datasets and Baselines for Low-Resource Speech to Text Translation. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195993950&partnerID=40&md5=022be970a18adaff6bdbf697c0a3f886en_US
dc.identifier.isbn978-2493814104-
dc.identifier.otherEID(2-s2.0-85195993950)-
dc.identifier.urihttps://dspace.iiti.ac.in/handle/123456789/14221-
dc.description.abstractSpeech-to-text (ST) task is the translation of speech in a language to text in a different language. It has use cases in subtitling, dubbing, etc. Traditionally, ST tasks have been solved by cascading automatic speech recognition (ASR) and machine translation (MT) models which leads to error propagation, high latency, and training time. To minimize such issues, end-to-end models have been proposed recently. However, we find that only a few works have reported results of ST models on a limited number of low-resource languages. To take a step further in this direction, we release datasets and baselines for low-resource ST tasks. Concretely, our dataset has 9 language pairs and benchmarking has been done against SOTA ST models. The low performance of SOTA ST models on Indic-TEDST data indicates the necessity of the development of ST models specifically designed for low-resource languages. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.en_US
dc.language.isoenen_US
dc.publisherEuropean Language Resources Association (ELRA)en_US
dc.source2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedingsen_US
dc.subjectAutomatic Speech Recognitionen_US
dc.subjectIndic Languagesen_US
dc.subjectLow-Resource Languagesen_US
dc.subjectMachine Translationen_US
dc.subjectSpeech-to-text Translationen_US
dc.subjectVideo Subtitlingen_US
dc.titleIndic-TEDST: Datasets and Baselines for Low-Resource Speech to Text Translationen_US
dc.typeConference Paperen_US
Appears in Collections:Department of Computer Science and Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: