Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation

Publication type
A1
Publication status
Published
Authors
Lefever, E., & Hoste, V.
Journal
INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS
Volume
19
Issue
3
Pagination
333-367
Publisher
John Benjamins Publishing Company
Download
(.pdf)
View in Biblio
(externe link)

Abstract

We present a multilingual approach to Word Sense Disambiguation (WSD), which automatically assigns the contextually appropriate sense to a given word. Instead of using a predefined monolingual sense-inventory, we use a language-independent framework by deriving the senses of a given word from word alignments on a multilingual parallel corpus, which we made available for corpus linguistics research. We built five WSD systems with English as the input language and translations in five supported languages (viz. French, Dutch, Italian, Spanish and German) as senses. The systems incorporate both binary translation features and local context features. The experimental results are very competitive, which confirms our initial hypothesis that each language contributes to the disambiguation of polysemous words. Because our system extracts all information from the parallel corpus, it offers a flexible language-independent approach, which implicitly deals with the sense distinctness issue and allows us to bypass the knowledge acquisition bottleneck for WSD.