Investigating the quality of static anchor embeddings from transformers for under-resourced languages

Publication type
C1
Publication status
Published
Authors
Singh, P., De Clercq, O., & Lefever, E.
Editor
Maite Melero, Sakriani Sakti and Claudia Soria
Series
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Pagination
176-184
Publisher
European Language Resources Association (ELRA) (Marseille, France)
Conference
LREC 2022 Workshop : the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL2022) (Marseille, France)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

This paper reports on experiments for cross-lingual transfer using the anchor-based approach of Schuster et al. (2019) for English and a low-resourced language, namely Hindi. For the sake of comparison, we also evaluate the approach on three very different higher-resourced languages, viz. Dutch, Russian and Chinese. Initially designed for ELMo embeddings, we analyze the approach for the more recent BERT family of transformers for a variety of tasks, both mono and cross-lingual. The results largely prove that like most other cross-lingual transfer approaches, the static anchor approach is underwhelming for the low-resource language, while performing adequately for the higher resourced ones. We attempt to provide insights into both the quality of the anchors, and the performance for low-shot cross-lingual transfer to better understand this performance gap. We make the extracted anchors and the modified train and test sets available for future research at https://github.com/pranaydeeps/Vyaapak