GENDEROUS : machine translation and cross-linguistic evaluation of a gender-ambiguous dataset

Publication type
C1
Publication status
Published
Authors
Hackenbuchner, J., Gkovedarou, EG, & Daems, J
Editor
Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Karolina Stańczak and Debora Nozza
Series
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Pagination
302-319
Publisher
Association for Computational Linguistics (ACL)
Conference
6th Workshop on Gender Bias in Natural Language Processing (GeBNLP) (Vienna, Austria)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

Contributing to research on gender beyond the binary, this work introduces GENDEROUS, a dataset of gender-ambiguous sentences containing gender-marked occupations and adjectives, and sentences with the ambiguous or non-binary pronoun their. We cross-linguistically evaluate how machine translation (MT) systems and large language models (LLMs) translate these sentences from English into four grammatical gender languages: Greek, German, Spanish and Dutch. We show the systems’ continued default to male-gendered translations, with exceptions (particularly for Dutch). Prompting for alternatives, however, shows potential in attaining more diverse and neutral translations across all languages. An LLM-as-a-judge approach was implemented, where benchmarking against gold standards emphasises the continued need for human annotations.