Building a new-generation corpus for empirical translation studies : the Dutch Parallel Corpus 2.0

Publication type: B2
Publication status: Published
Authors: Reynaert, R., Macken, L., Tezcan, A., & De Sutter, G.
Editor: Vincent Wang, Lily Lim and Defeng Li
Series: New perspectives on corpus translation studies
Pagination: 75-100
Publisher: Springer (Singapore)
Download
View in Biblio

Abstract

This chapter introduces a new, updated version of the Dutch Parallel Corpus, a bidirectional parallel corpus of expert translations for Dutch><English and Dutch><French language pairs. This revisited version of the corpus, which we dub Dutch Parallel Corpus 2.0, is dynamic in nature, and contains 2.75 million words at the time of writing. The corpus is sentence-aligned, lemmatized and POS-tagged using the state-of-the-art natural language processing toolkit Stanza. Compared to its predecessor, the Dutch Parallel Corpus 2.0 contains more metadata about the translators (e.g. gender, education, experience) and the translation projects (e.g. L1/L2 translation, software used, degree and type of revision), next to the traditional metadata about the texts themselves (e.g. source and target language, intended audience, intended goal, register). The availability of an extensive set of metadata is considered the main asset of this corpus, together with a more principled and flexible register classification, thus stimulating corpus-based translation scholars to answer more refined research questions about the linguistic and contextual factors that shape translated texts, and ultimately fostering ideas and theories about the social and cognitive processes involved in translation performance. The corpus is freely available for research purposes via https://www.dpc2.ugent.be/.

June 8, 2026	20 years of LT3
May 31, 2026	PhD Defense Quanqi Du
May 20, 2026	📢 PhD Position
Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie