DBBErt : part-of-speech tagging of pre-modern Greek text

Publication type
C1
Publication status
Published
Authors
Swaelens, C., De Vos, I., & Lefever, E.
Editor
Krister Lindén, Jyrki Niemi and Thalassia Kontino
Series
CLARIN Annual Conference Proceedings
Pagination
155-158
Conference
CLARIN Annual Conference 2023 (Leuven, Belgium)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

This contribution presents DBBErt, a machine-learning approach to linguistic annotation for pre-Modern Greek, which provides a part-of-speech and fine-grained morphological analysis of Greek tokens. To this end, transformer-based language models were built on both pre-Modern and Modern Greek text and further fine-tuned on annotated treebanks. The experimental results look very promising on a gold standard of Byzantine book epigrams, with an F-score of 83% for coarse-grained part-of-speech-tagging and of 69% for fine-grained morphological analysis. The resulting pipeline and models will be added to the CLARIN infrastructure to stimulate further research in NLP for Ancient and Medieval Greek.