Generative Artificial Intelligence & Three Lexicographic Fs: FANTASTIC for monolingual English dictionaries, FAKE for translation dictionaries, FAIL for exotic dictionaries

Publication type
U
Publication status
Published
Author
de Schryver, G-M
Series
Paper presented at the Department of English, University of Missouri, Columbia, MO, USA, 5 April 2024
View in Biblio
(externe link)

Abstract

Merely eight months following the release of ChatGPT to the wider public on November 30, 2022, the state of the art of using generative AI [gen AI] in lexicography was surveyed (cf. de Schryver 2023). Now another eight months later, it is necessary to take stock again. If one is to believe the ten studies in de Schryver (2023), and all subsequent studies to date (esp. Lew and colleagues 2024), gen AI has now made lexicographers, as well as dictionaries themselves, redundant. However, all these studies conveniently assume that because it works for English, it will work for any other language. It is time to reveal the truth. Pairing any other language with English only produces look-alikes: the lexicographic material appears to be sound, until one scratches the surface and realises that what was generated is ‘translated English’ (and thus translated English meanings, and translated English language structures — not the source-language meanings, nor the source-language structures). When it comes to dictionaries for languages of limited diffusion [LLDs], the use of the existing large language models [LLMs] mostly produces gibberish, owing to too little LLD data in the models. During the presentation, these three aspects will be unpacked and amply illustrated with ‘beautiful’ dictionary articles produced by gen AI for English, ‘erroneous’ dictionary articles produced by gen AI for Portuguese-English, and ‘crap’ dictionary articles produced by gen AI for samples from the Bantu language family.