The automatic determination of translation equivalents in lexicography: What works and what doesn’t?

Publication type
U
Publication status
Published
Authors
Denisová, M., de Schryver, G-M, & Rychlý, P.
Editor
Kristina Š. Despot, Ana Ostroški Anić and Ivana Brač
Series
Lexicography and Semantics: Book of Abstracts of the XXI EURALEX International Congress, 8–12 October 2024, Cavtat, Croatia
Pagination
253-253
Publisher
Institute for the Croatian Language (Zagreb)
Conference
XXI EURALEX International Congress (Zagreb)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

Cross-lingual embedding models act as facilitator of lexical knowledge transfer and offer many advantages, notably their applicability to low-resource and non-standard language pairs, making them a valuable tool for retrieving translation equivalents in lexicography. Despite their potential, these models have primarily been developed with a focus on Natural Language Processing (NLP), leading to significant issues, including flawed training and evaluation data, as well as inadequate evaluation metrics and procedures. In this paper, we introduce cross-lingual embedding models for lexicography, addressing the challenges and limitations inherent in the current NLP-focused research. We demonstrate the problematic aspects across three baseline cross-lingual embedding models and three language pairs and outline possible solutions. We show the importance of high-quality data, advocating that its role is vital compared to algorithmic optimisation in enhancing the effectiveness of these models.