Statistical Machine Translation
Hybrid Distillation for Low-Resource Language Translation
Pages
10
Time to read
30 mins
Publication
Language
English
Pages
10
Time to read
30 mins
Publication
Language
English
This technical report details the participation of the Helsinki-NLP team in the 2024 Shared Task on Translation into Low-Resource Languages of Spain. The task aims to develop Machine Translation (MT) models for translating Spanish into three endangered Romance languages: Aranese, Aragonese, and Asturian. The report outlines the methodologies employed, including data collection, augmentation, and preparation strategies, which involve gathering additional data from Wikipedia and online dictionaries, as well as producing back-translations. The team submitted four multilingual models that utilize Knowledge Distillation (KD) to merge outputs from neural and rule-based systems, balancing translation quality with computational efficiency. Results indicate that the distilled models achieved competitive rankings in the open submission track, demonstrating their effectiveness. The report also discusses the benchmarking of existing models and the evaluation of translation quality and efficiency, contributing to the development of resources for under-resourced languages.