Dutch Parallel Corpus: MT Corpus and translator's aid

Publication type
C1
Publication status
Published
Authors
Macken, L., Trushkina, J., & Rura, L.
Editor
B. Maegaard
Series
Proceedings of Machine Translation Summit XI
Pagination
313-320
Publisher
European Association of Machine Translation (Copenhagen, Denmark)
Download
(.pdf)
Project
DPC

Abstract

This paper reports on the development of the Dutch Parallel Corpus: a high quality sentence-aligned parallel corpus of 10 million words for the language pairs Dutch-English and Dutch-French. The corpus is composed of different text types. All steps of processing the corpus including alignment and linguistic annotation undergo quality control on different levels. Four categories of potential users of the DPC can be distinguished: developers of HLT-applications, linguists conducting more fundamental research, human translators and language learners. This paper focuses on two types of intended users: MT developers and human translators. The paper describes different characteristics of the corpus relevant for such users, concentrating on corpus design, processing of the corpus data and the exploitation of the corpus.