Machine Translation Evaluation Benchmark for Wu Chinese preview page 1

Statistical Machine Translation

Machine Translation Evaluation Benchmark for Wu Chinese

Pages

Time to read

19 mins

Publication

10/14/24

Language

English

Summary

This technical report presents the development of a machine translation evaluation benchmark specifically for Wu Chinese. The report outlines the creation of a new dataset that serves as both a training corpus and evaluation benchmark for machine translation models. It details the contributions made to the FLORES+ dataset, which includes an open-source, manually translated dataset, comprehensive documentation on the dataset creation process, and validation experiments. The report also discusses preliminary tools developed for Wu Chinese normalization and segmentation. Furthermore, it highlights the benefits and limitations of the dataset, as well as its implications for other under-resourced languages. The methodology employed in constructing the dataset is described, including the translation process and the dialectal considerations of the Chongming dialect. The report concludes with a discussion of the results of the experiments conducted and suggests areas for future work in Wu Chinese machine translation.

Statistical Machine Translation

Machine Translation Evaluation Benchmark for Wu Chinese

Summary

Get the Full Copy