Statistical Machine Translation
Machine Translation for Low-Resource Indic Languages
Pages
5
Time to read
14 mins
Publication
Language
English
Pages
5
Time to read
14 mins
Publication
Language
English
This document is a research article presented at the Ninth Conference on Machine Translation (WMT) that discusses the challenges and methodologies for improving machine translation for low-resource Indic languages, specifically Manipuri and Khasi. The authors outline their approach to the WMT24 Low-Resource Indic Neural Machine Translation task, which involves employing innovative techniques such as back-translation and fine-tuning of pretrained models like mBART. They detail the process of fine-tuning the mBART model for English-to-Khasi and English-to-Manipuri translations, including data augmentation strategies and quality filtering methods to enhance the dataset. The results indicate that the English-to-Khasi model achieved the highest BLEU score of 0.0492, while the English-to-Manipuri model faced challenges due to limited training data. The article emphasizes the importance of fine-tuning and data quality in achieving better translation outcomes for low-resource languages.