It’s all in the eyes: an eye tracking experiment to assess the readability of machine translated literature

Publication type
U
Publication status
Published
Authors
Colman, T., Fonteyne, M., Daems, J, & Macken, L.
Conference
31st Meeting of Computational Linguistics in The Netherlands (CLIN 31) (Ghent, Belgium)
View in Biblio
(externe link)

Abstract

With the arrival of neural machine translation (NMT) systems, translation quality has improved enormously. Despite these quality improvements, especially for more creative text types such as literary texts, remarkable differences can be observed when comparing human and machine translations. Webster et al. (2020) compared the modern Dutch human translations of four classic novels with their machine translated versions generated by Google Translate and DeepL and found that not only a large proportion of the machine translated sentences contained errors, but they also observed a lower level of lexical richness and local cohesion in the NMT output compared to the human translations. The most frequent errors observed in their data set were mistranslations (37%), coherence (32%), and style & register (13%) errors. This top three corresponds to previous research by Tezcan et al. (2019) and Fonteyne et al. (2020) who both discussed the quality of Agatha Christie’s The Mysterious Affair at Styles, translated by Google’s neural machine translation system from English into Dutch. In this poster presentation, we report on the experimental design of an eye-tracking study in which participants read the full novel (Agatha Christie’s The Mysterious Affair at Styles) in Dutch, alternating between a machine translation (MT) and a human translation (HT). We aim to compare the reading process of participants reading both versions, and analyse to what extent MT impacts the reading process. As a human annotator has marked and classified all errors in the machine translated version of the novel (Fonteyne et al. 2020) we will also be able to study which errors impact this reading process most. The data set expands the Ghent Eye-Tracking Corpus (Cop et al., 2017), which contains eye-tracking data of participants reading Agatha Christie’s novel in English and in its Dutch (human) translation.