Language technologies for low resource languages : sociolinguistic and multilingual insights

Publication type
C1
Publication status
Published
Authors
Doğruöz, A.S., & Sitaram, S.
Editor
Maite Melero, Sakriani Sakti and Claudia Soria
Series
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Pagination
92-97
Publisher
European Language Resources Association (ELRA) (Marseille, France)
Conference
LREC 2022 Workshop : the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL2022) (Marseille, France)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

There is a growing interest in building language technologies (LTs) for low resource languages (LRLs). However, there are flaws in the planning, data collection and development phases mostly due to the assumption that LRLs are similar to High Resource Languages (HRLs) but only smaller in size. In our paper, we first provide examples of failed LTs for LRLs and
provide the reasons for these failures. Second, we discuss the problematic issues with the data for LRLs. Finally, we provide recommendations for building better LTs for LRLs through insights from sociolinguistics and multilingualism. Our goal is not to solve all problems around LTs for LRLs but to raise awareness about the existing issues, provide recommendations toward possible solutions and encourage collaboration across academic disciplines for developing LTs that actually serve the needs and preferences of the LRL communities.