Comparing HT and PE using advanced research tools

Publication type: V
Publication status: Published
Authors: Daems, J, Vandepitte, S., Hartsuiker, R., & Macken, L.
View in Biblio

Abstract

Post-editing has become a fairly common alternative to regular translation, especially when time is of the essence. More and more post-editing research is being conducted, yet this research often gives rise to new questions specific to the post-editing process and the consequences of this process (Dillinger, 2014). According to Dillinger (2014), the advancement of technological research tools such as keystroke loggers and eyetrackers revolutionized translation research in that we are now able to look at the translation and post-editing process in more detail, giving rise to more specific questions every day.

Whereas the benefit of using post-editing in a technical context has been established (Plitt & Masselot, 2010), the potential benefit of using post-editing for general text types is currently still underresearched, with a few exceptions (Elming, Balling, & Carl, 2014). The present study aims to fill this gap by comparing human translation with post-editing for general texts (newspaper articles), for the English-Dutch language pair. We try to answer the following questions: is post-editing faster than human translation, is there a difference in quality between the product of human translation and the product of post-edited machine translation output, is there a difference in translation units between human translation and post-editing, is post-editing cognitively more demanding than human translation, is the gaze behavior different for post-editing and human translation (Mesa-Lao, 2014), and are more (or other) external resources consulted in human translation than in post-editing? Additionally, we wish to gain a better understanding of the post-editing process itself, by linking our findings back to the quality of the original machine translation output. More specifically, we want to find out whether there is a relationship between the quality of the initial machine translation output and the frequency of consulting the source text, and whether there is a correlation between the quality of the original machine translation output, and the quality of the final product after post-editing.

We report on a study with ten master's students of translation who each translated four texts and post-edited four texts. The design was balanced so that we obtained five post-edited versions and five human translations for each text. Students had passed their general translation exam, and had no previous post-editing experience. Texts were newspaper articles with comparable Lexile scores, selected from Newsela. The final text selection was made by identifying common translation problems and looking at the quality of the machine translation output for each text. Before and after the experiment, participants filled out surveys providing us with metadata. By using CasMaCat, a state-of-the-art translator workbench with additional keystroke logging functions and eye-tracking integration, we were able to look into various aspects of the translation and post-editing process: number of fixations on source and target text, average gaze duration on source and target text, number of production units and the number of words within each production unit, time spent on parallel reading and writing activity, and the time needed to translate a segment (with or without pauses longer than 5 seconds). The additional logging with Inputlog, a keystroke logger capable of logging different programs and browser tabs, provides us with information on the external resources consulted during translation. Cognitive load is looke at through the analysis of gaze data and the time spent on parallel activity, with the addition of subjective feelings of cognitive load taken from the surveys. In order to assess and compare the quality of the final products, we applied our two-step translation quality assessment approach (Daems, Macken, & Vandepitte, 2013). This allows us to compare the quality for issues relating to adequacy, as well as issues relating to acceptability.

June 8, 2026	20 years of LT3
May 31, 2026	PhD Defense Quanqi Du
May 20, 2026	📢 PhD Position
Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie