Machine Translation for Low-Resource Indic Languages preview page 1

Statistical Machine Translation

Machine Translation for Low-Resource Indic Languages

Pages

Time to read

14 mins

Publication

10/03/24

Language

English

Summary

This document is a research article presented at the Ninth Conference on Machine Translation (WMT) that discusses the challenges and methodologies for improving machine translation for low-resource Indic languages, specifically Manipuri and Khasi. The authors outline their approach to the WMT24 Low-Resource Indic Neural Machine Translation task, which involves employing innovative techniques such as back-translation and fine-tuning of pretrained models like mBART. They detail the process of fine-tuning the mBART model for English-to-Khasi and English-to-Manipuri translations, including data augmentation strategies and quality filtering methods to enhance the dataset. The results indicate that the English-to-Khasi model achieved the highest BLEU score of 0.0492, while the English-to-Manipuri model faced challenges due to limited training data. The article emphasizes the importance of fine-tuning and data quality in achieving better translation outcomes for low-resource languages.

Statistical Machine Translation

Machine Translation for Low-Resource Indic Languages

Summary

Get the Full Copy