An annotation scheme and Gold Standard for Dutch-English word alignment

Publication type
C1
Publication status
Published
Author
Macken, L.
Editor
N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapias
Series
Proceedings of the seventh International Conference on Language Resources and Evaluation (LREC'10)
Pagination
3369-3374
Publisher
European Language Resources Association (Valletta, Malta)
Download
(.pdf)

Abstract

The importance of sentence-aligned parallel corpora has been widely acknowledged. Reference corpora in which sub-sentential translational correspondences are indicated manually are more labour-intensive to create, and hence less widespread. Such manually created reference alignments – also called Gold Standards – have been used in research projects to develop or test automatic word alignment systems. In most translations, translational correspondences are rather complex; for example word-by-word correspondences can be found only for a limited number of words. A reference corpus in which those complex translational correspondences are aligned manually is therefore also a useful resource for the development of translation tools and for translation studies. In this paper, we describe how we created a Gold Standard for the Dutch-English language pair. We present the annotation scheme, annotation guidelines, annotation tool and inter-annotator results. To cover a wide range of syntactic and stylistic phenomena that emerge from different writing and translation styles, our Gold Standard data set contains texts from different text types. The Gold Standard will be publicly available as part of the Dutch Parallel Corpus.