LeConTra : a learner corpus of English-to-Dutch news translation

Publication type: P1
Publication status: Published
Authors: Vanroy, B., & Macken, L.
Editor: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélane Mazo, Jan Odijk and Stelios Piperidis
Series: LREC 2022 : THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Pagination: 1807-1816
Publisher: European Language Resources Association (ELRA) (Marseille, France)
Conference: 13th Conference on Language Resources and Evaluation (LREC 2022) (Marseille, France)
Download
View in Biblio

Abstract

We present LeConTra, a learner corpus consisting of English-to-Dutch news translations enriched with translation process data. Three students of a Master's programme in Translation were asked to translate 50 different English journalistic texts of approximately 250 tokens each. Because we also collected translation process data in the form of keystroke logging, our dataset can be used as part of different research strands such as translation process research, learner corpus research, and corpus-based translation studies. Reference translations, without process data, are also included. The data has been manually segmented and tokenized, and manually aligned at both segment and word level, leading to a high-quality corpus with token-level process data. The data is freely accessible via the Translation Process Research DataBase, which emphasises our commitment of distributing our dataset. The tool that was built for manual sentence segmentation and tokenization, Mantis, is also available as an open-source aid for data processing.

July 10, 2025	LT3 at EST 2025
July 4, 2025	LT3 at MT Summit and ICWSM 2025
June 27, 2025	Workshop CALM Work Placements
June 12, 2025	LT3 at LTRC, ICTIC, NITS and DHBenelux
June 5, 2025	Podcast Episode Dwars Door de Klas