Lost in post-editing: Identifying the MT error types that are most problematic for student post-editors

Publication type
V
Publication status
Published
Authors
Daems, J, Macken, L., Vandepitte, S., & Hartsuiker, R.
View in Biblio
(externe link)

Abstract

Post-editing machine translation (MT) is an important step towards high quality translations. In order to better understand the post-editing process, a better understanding of the relationship between MT output and the post-editing of this output is necessary. While automatic evaluation metrics such as the widely used BLEU can be used to compare the overall quality of different MT systems, a more detailed error analysis is necessary to identify the typical errors that appear in MT output and the subsequent post-editing.

By using a fine-grained Translation Quality Assessment approach and grouping translation errors into source text-related error sets, we link MT errors to errors after post-editing to examine their relationship. We are mainly interested in answering the following questions: What (and how many) MT errors are solved by post-editors, what (and how many) problems occur in post-edited MT and which (and how many) of these originate from MT?

We present the results of a pilot study in which student translators post-edited newspaper articles and user documentation from English into Dutch. We found that the MT errors that student post-editors most easily correct are grammatical errors, whereas e.g. wrong collocations, word sense disambiguation errors and the misspelling of compounds prove to be more challenging. As such, we can identify the types of errors that post-editor training should pay more attention to.