Annotating the Dutch Parallel Corpus

Publication type
C1
Publication status
Published
Authors
Paulussen, H., & Macken, L.
Editor
Lars Ahrenberg, Jörg Tiedemann, and Martin Volk
Journal
Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora (AEPC)
Series
NEALT Proceedings Series, 10
Pagination
63-72
Publisher
Northern European Association for Language Technology (NEALT)
Download
(.pdf)
Project
DPC

Abstract

The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text samples aligned at sentence level. Next to sentence alignment, the corpus has also been grammatically annotated, thus improving exploitation for different domains, including natural language processing, translation research or CALL (computer-assisted language learning). In this paper, we describe the compilation of DPC and the alignment procedures used. This is followed by a description of the annotation task for the three languages, which required different tools and different tag sets. Finally the impact of different grammatical annotations on multilingual corpus exploitation is discussed.