Fine-tuning GPT models for lexicography

Start date
Jan. 1, 2024
End date
Dec. 31, 2027
Sponsor
Regional and community funding: Special Research Fund
Website
https://research.ugent.be/web/result/project/d4e04617-d340-11ef-b3da-03cc4b44417e/details/en
Research portal
http://research.flw.ugent.be/projects/fine-tuning-gpt-models-lexicography

About GPT 4 LEX

Soon after the release of ChatGPT, the state of the art of generative AI in lexicography was surveyed (cf. de Schryver 2023). If one is to believe that survey, as well as many subsequent studies (esp. Lew and colleagues 2024), generative AI has now made lexicographers, as well as dictionaries themselves, redundant. However, these studies conveniently assume that because it works for English, it will work for any other language. It is time to reveal the truth. Pairing any other language with English only produces look-alikes: the lexicographic material appears to be sound, until one scratches the surface and realises that what was generated is ‘translated English’. When it comes to dictionaries for languages of limited diffusion, the use of existing models mostly produce gibberish. In this research project, various comparisons will be made between out-of-the-box, customisation and fine-tuned GPT models for lexicography, with a focus on monolingual dictionaries for undocumented Bantu languages.