Dutch Parallel Corpus: a multifunctional and multilingual corpus
- Publication type
- Publication status
- Paulussen, H., Macken, L., Trushkina, J., Desmet, P., & Vandeweghe, W.
- Cahiers de l'Institut de Linguistique de Louvain
- Peeters (Louvain-la-Neuve, Belgium)
Nowadays, text corpora play an important role in language research and all fields involving language study, including theoretical and applied linguistics, language technology, translation studies and CALL (Computer Assisted Language Learning). Multilingual corpora, especially translated corpora, are not always readily available for Dutch. Much depends on the private initiative of individuals, and the data are often restrictedly available. The DPC-project (Dutch Parallel Corpus), which is carried out within the STEVIN program (Odijk et al. 2004), intends to fill the gap for this type of corpora for Dutch. This paper gives an overview of the DPC project. First, an overview and a discussion is given of the main parallel corpora containing Dutch. Then the DPC project is described, focusing on those aspects that make the DPC different from existing parallel corpora. Finally, the choice of an XML based format is explained.