GECO-MT : the Ghent Eye-tracking Corpus of Machine Translation

Publication type: P1
Publication status: Published
Authors: Colman, T., Fonteyne, M., Daems, J, Dirix, N., & Macken, L.
Editor: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélane Mazo, Jan Odijk and Stelios Piperidis
Series: LREC 2022 : THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Pagination: 29-38
Publisher: European Language Resources Association (ELRA) (Marseille, France)
Conference: 13th Conference on Language Resources and Evaluation (LREC 2022) (Marseille, France)
Download
Project: ArisToCAT
View in Biblio

Abstract

In the present paper, we describe a large corpus of eye movement data, collected during natural reading of a human translation and a machine translation of a full novel. This data set, called GECO-MT (Ghent Eye-tracking Corpus of Machine Translation) expands upon an earlier corpus called GECO (Ghent Eye-tracking Corpus) by Cop et al. (2017). The eye movement data in GECO-MT will be used in future research to investigate the effect of machine translation on the reading process and the effects of various error types on reading. In this article, we describe in detail the materials and data collection procedure of GECO-MT. Extensive information on the language proficiency of our participants is given, as well as a comparison with the participants of the original GECO. We investigate the distribution of a selection of important eye movement variables and explore the possibilities for future analyses of the data. GECO-MT is freely available at https://www.lt3.ugent.be/resources/geco-mt.

June 27, 2025	Workshop CALM Work Placements
June 12, 2025	LT3 at LTRC, ICTIC, NITS and DHBenelux
June 5, 2025	Podcast Episode Dwars Door de Klas
June 3, 2025	PhD Defense Margot 🎓
May 30, 2025	The road towards fine-tuned LLMs for lexicography