Evaluating existing lemmatisers on unedited Byzantine Greek poetry

Publication type
C1
Publication status
Published
Authors
Swaelens, C., De Vos, I., & Lefever, E.
Editor
Adam Anderson, Shai Gordin, Stav Klein, Bin Li, Yudong Liu and Marco Passarotti
Series
Proceedings of the Ancient Language Processing Workshop
Pagination
111-116
Conference
Ancient Language Processing Workshop (ALP), associated with the 14th International Conference on Recent Advances in Natural Language Processing (RANLP 2023) (Varna, Bulgaria)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

This paper reports on the results of a com- parative evaluation of four existing lemmatizers, all pre-trained on Ancient Greek texts, on a novel corpus of unedited, Byzantine Greek texts. The aim of this study is to get insights into the pitfalls of existing lemmatisation approaches as well as the specific challenges of our Byzantine Greek corpus, in order to develop a new lemmatizer that can cope with its peculiarities. The results of the experiment show an accuracy drop of 20% on our corpus, which is further investigated in a qualitative error analysis.