Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries

Publication type
A2
Publication status
Published
Authors
Singh, P., Rigouts Terryn, A., & Lefever, E.
Journal
COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL
Volume
12
Pagination
125-140
Download
(.pdf)
View in Biblio
(externe link)

Abstract

This paper reports on a set of proof-of-concept experiments performed to evaluate and improve the alignment of monolingual embeddings for a specialised domain, viz. the medical use case of heart failure. The presented approach, which creates domain-specific dictionaries on-the-fly from cross-lingual Wikipedia links, achieves good results for cross-lingual alignment of this specialised vocabulary in three language pairs: English-Dutch, English-French, and Dutch-French. The experimental results show that the setup incorporating a smaller but dedicated domain-specific dictionary outperforms the alignment incorporating a larger but general-domain seed dictionary. A detailed error analysis reveals that many potentially useful (near-)equivalents are found beyond those present in the gold standard, and it inspires strategies for further improvements, such as lemmatisation and improved tokenisation.