Linguistic annotation of Byzantine book epigrams

Publication type
A1
Publication status
In press
Authors
Swaelens, C., De Vos, I., & Lefever, E.
Journal
LANGUAGE RESOURCES AND EVALUATION
Download
(.pdf)
View in Biblio
(externe link)

Abstract

In this paper, we explore the feasibility of developing a part-of-speech tagger for not-normalised, Byzantine Greek epigrams. Hence, we compared three different transformer-based models with embedding representations, which are then fine-tuned on a fine-grained part-of-speech tagging task. To train the language models, we compiled two data sets: the first consisting of Ancient and Byzantine Greek texts, the second of Ancient, Byzantine and Modern Greek. This allowed us to ascertain whether Modern Greek contributes to the modelling of Byzantine Greek. For the supervised task of part-of-speech tagging, we collected a training set of existing, annotated (Ancient) Greek texts. For evaluation, a gold standard containing 10,000 tokens of unedited Byzantine Greek poems was manually annotated and validated through an inter-annotator agreement study. The experimental results look very promising, with the BERT model trained on all Greek data achieving the best performance for fine-grained part-of-speech tagging.