DBBErt : part-of-speech tagging of pre-modern Greek text

Publication type: C1
Publication status: Published
Authors: Swaelens, C., De Vos, I., & Lefever, E.
Editor: Krister Lindén, Jyrki Niemi and Thalassia Kontino
Series: CLARIN Annual Conference Proceedings
Pagination: 155-158
Conference: CLARIN Annual Conference 2023 (Leuven, Belgium)
Download
View in Biblio

Abstract

This contribution presents DBBErt, a machine-learning approach to linguistic annotation for pre-Modern Greek, which provides a part-of-speech and fine-grained morphological analysis of Greek tokens. To this end, transformer-based language models were built on both pre-Modern and Modern Greek text and further fine-tuned on annotated treebanks. The experimental results look very promising on a gold standard of Byzantine book epigrams, with an F-score of 83% for coarse-grained part-of-speech-tagging and of 69% for fine-grained morphological analysis. The resulting pipeline and models will be added to the CLARIN infrastructure to stimulate further research in NLP for Ancient and Medieval Greek.

April 8, 2024	Vacancy post-doctoral assistant at LT3
March 27, 2024	LT3 members involved in the organization of various shared tasks and workshops
Jan. 20, 2024	Veronique appointed as Francqui chair 2023-2024 at ULB
Nov. 7, 2023	Gilles-Maurice shows how ChatGPT can compile excellent dictionaries (for English)
Oct. 25, 2023	Meet the expert: Prof. Lynne Bowker