Lemmatisation & morphological analysis of unedited Greek : do simple tasks need complex solutions?

Publication type: C1
Publication status: Published
Authors: Swaelens, C., De Vos, I., & Lefever, E.
Editor: Wanxiang Che, Joyce Nabende, Ekaterina Shutova and Mohammad Taher Pilehvar
Series: Findings of the Association for Computational Linguistics : ACL 2025
Pagination: 7681-7689
Publisher: Association for Computational Linguistics (ACL)
Conference: 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) (Vienna, Austria)
Download
View in Biblio

Abstract

Fine-tuning transformer-based models for part-of-speech tagging of unedited Greek text has outperformed traditional systems. However, when applied to lemmatisation or morphological analysis, fine-tuning has not yet achieved competitive results. This paper explores various approaches to combine morphological features to both reduce label complexity and enhance multi-task training. Specifically, we group three nominal features into a single label, and combine the three most distinctive features of verbs into another unified label. These combined labels are used to fine-tune DBBERT, a BERT model pre-trained on both ancient and modern Greek. Additionally, we experiment with joint training -- both among these labels and in combination with POS tagging -- within a multi-task framework to improve performance by transferring parameters. To evaluate our models, we use a manually annotated gold standard from the Database of Byzantine Book Epigrams. Our results show a nearly 9 pp. improvement, demonstrating that multi-task learning is a promising approach for linguistic annotation in less standardised corpora.

Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie
Oct. 6, 2025	PhD Defense Aaron
Oct. 2, 2025	Tekom Belgium at the LT3 offices
Sept. 29, 2025	Francesca at ICLC 11