Statistical Machine Translation
Findings of the WMT 2024 Shared Task
Pages
8
Time to read
26 mins
Publication
Language
English
Pages
8
Time to read
26 mins
Publication
Language
English
This document is a technical report detailing the results of the WMT 2024 shared task organized by the Open Language Data Initiative (OLDI). The task aimed to enhance the FLORES+ and MT Seed multilingual datasets, which serve as foundational resources for language technology development. The report outlines the participation of ten submissions covering 16 languages, which contributed to extending the datasets and improving existing data quality. The document discusses the importance of high-quality datasets for under-served languages and highlights the methodologies used for data collection and validation. It emphasizes the requirement for contributions to be released under open licenses to ensure accessibility for language communities. Additionally, the report describes the specific datasets involved, including the FLORES+ dataset, which aims to provide evaluation benchmarks for multilingual translation, and the MT Seed dataset, which serves as a source of training data for languages lacking sufficient resources. The report concludes with a summary of the submissions and their contributions to the shared task.