Statistical Machine Translation
Evaluation of Contextual Machine Translation Systems
Pages
15
Time to read
46 mins
Publication
Language
English
Pages
15
Time to read
46 mins
Publication
Language
English
This technical report investigates the effectiveness of contextual machine translation (MT) systems, focusing on the necessity of document-level annotations for training. The authors, Matt Post and Marcin Junczys-Dowmunt from Microsoft, explore the limitations of traditional sentence-level MT systems that often fail to address translation ambiguities due to a lack of context. The study builds large-scale contextual MT systems for German, French, and Russian, utilizing both parallel and back-translated data. Key findings indicate that sourcing contextual training samples exclusively from back-translated data yields the best results. Additionally, the report emphasizes the importance of generative evaluation over contrastive metrics, as the latter may not accurately reflect the systems' translation capabilities. The authors also discuss the challenges of evaluating contextual translations and the need for datasets rich in discourse-level phenomena to improve assessment accuracy. Overall, the report contributes valuable insights into the construction and evaluation of contextual translation systems.