Constructing a cross-document event coreference corpus for Dutch

Publication type
Publication status
In press
De Langhe, L., De Clercq, O., & Hoste, V.
View in Biblio
(externe link)


Event coreference resolution is a task in which different text fragments that refer to the same real-world event are automatically linked together. This task can be performed not only within a single document but also across different documents and can serve as a basis for many useful Natural Language Processing applications. Resources for this type of research, however, are extremely limited. We compiled the first large-scale dataset for cross-document event coreference resolution in Dutch, comparable in size to the most widely used English event coreference corpora. As data for event coreference is notoriously sparse, we took additional steps to maximize the number of coreference links in our corpus. Due to the complex nature of event coreference resolution, many algorithms consist of pipeline architectures which rely on a series of upstream tasks such as event detection, event argument identification and argument coreference. We tackle the task of event argument coreference to both illustrate the potential of our compiled corpus and to lay the groundwork for a Dutch event coreference resolution system in the future. Results show that existing NLP algorithms can be easily retrofitted to contribute to the subtasks of an event coreference resolution pipeline system.