Lemmatisation & morphological analysis of unedited Greek : do simple tasks need complex solutions?

Publication type
C1
Publication status
In press
Authors
Swaelens, C., De Vos, I., & Lefever, E.
Series
63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
Publisher
Association for Computational Linguistics (ACL) (Vienna)
Conference
63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) (Vienna, Austria)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

Fine-tuning transformer-based models for part-of-speech tagging of unedited Greek text has outperformed traditional systems. However, when applied to lemmatisation or morphological analysis, fine-tuning has not yet achieved competitive results. This paper explores various approaches to combine morphological features to both reduce label complexity and enhance multi-task training. Specifically, we group three nominal features into a single label, and combine the three most distinctive features of verbs into another unified label. These combined labels are used to fine-tune DBBERT, a BERT model pre-trained on both ancient and modern Greek. Additionally, we experiment with joint training -- both among these labels and in combination with POS tagging -- within a multi-task framework to improve performance by transferring parameters. To evaluate our models, we use a manually annotated gold standard from the Database of Byzantine Book Epigrams. Our results show a nearly 9 pp. improvement, demonstrating that multi-task learning is a promising approach for linguistic annotation in less standardised corpora.