GECO-MT : the Ghent Eye-tracking Corpus of Machine Translation

Publication type
P1
Publication status
Published
Authors
Colman, T., Fonteyne, M., Daems, J, Dirix, N., & Macken, L.
Editor
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélane Mazo, Jan Odijk and Stelios Piperidis
Series
LREC 2022 : THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Pagination
29-38
Publisher
European Language Resources Association (ELRA) (Marseille, France)
Conference
13th Conference on Language Resources and Evaluation (LREC 2022) (Marseille, France)
Download
(.pdf)
Project
ArisToCAT
View in Biblio
(externe link)

Abstract

In the present paper, we describe a large corpus of eye movement data, collected during natural reading of a human translation and a machine translation of a full novel. This data set, called GECO-MT (Ghent Eye-tracking Corpus of Machine Translation) expands upon an earlier corpus called GECO (Ghent Eye-tracking Corpus) by Cop et al. (2017). The eye movement data in GECO-MT will be used in future research to investigate the effect of machine translation on the reading process and the effects of various error types on reading. In this article, we describe in detail the materials and data collection procedure of GECO-MT. Extensive information on the language proficiency of our participants is given, as well as a comparison with the participants of the original GECO. We investigate the distribution of a selection of important eye movement variables and explore the possibilities for future analyses of the data. GECO-MT is freely available at https://www.lt3.ugent.be/resources/geco-mt.