KNACK-2002: a Richly Annotated Corpus of Dutch Written Text

Publication type
C1
Publication status
Published
Authors
Hoste, V., & De Pauw, G.
Series
Proceedings of the Fifth Conference on International Language Resources and Evaluation (LREC'06)
Publisher
European Language Resources Association (Genova, Italy)
Download
(.pdf)

Abstract

In this paper, we introduce the annotated KNACK-2002 corpus of Dutch written text. The corpus features five different annotation layers, ranging from the annotation of morphological boundaries at the word level, over the annotation of part-of-speech tags and phrase chunks at the syntactic level to the annotation of named entities at the semantic level and coreferential relations at the discourse level. We believe the corpus is unique in the Dutch language area because of its richness of annotation layers, providing researchers with a useful gold standard data set for different NLP tasks in the domains of morphology, (morpho)syntax, semantics and discourse.