A Coreference Corpus and Resolution System for Dutch

Hendrickx, I., Bouma, G., Coppens, F., Daelemans, W., Hoste, V., Kloosterman, G., Mineur, A., Van Der Vloet, J., & Verschelde, J.
N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, and D. Tapias
Proceedings of the Sixth Conference on International Language Resources and Evaluation (LREC'08)
European Language Resources Association (Marrakech, Morocco)


We present the main outcomes of the COREA project: a corpus annotated with coreferential relations and a coreference resolution system for Dutch. We discuss the annotation of the corpus: the type of annotated relations, the guidelines, the annotation tool and inter-annotator agreement. We also show a visualization of the annotated relations. The standard approach to evaluate a coreference resolution system is to compare the predictions of the system to a hand-annotated gold standard test set (cross-validation). A more practically oriented evaluation is to test the usefulness of coreference relation information in an NLP application. We present results of both types of evalutation. We run experiments with an Information Extraction module for the medical domain, and measure the performance of this module with and without coreference relation information. In a separate experiment we also evaluate the effect of coreference information produced by a simple rule-based coreference module in a Question Answering application