Using GWAPs for verifying PoS tagging of spoken dialectal Spanish

Publication type
C1
Publication status
Published
Authors
Bonilla, J., Segundo Diaz, R., & Bouzouita, M.
Series
10th International Conference on Behavioural and Social Computing (BESC); Proceedings
Pagination
1-7
Publisher
IEEE
Conference
2023 10th International Conference on Behavioural and Social Computing (BESC) (Larnaca, Cyprus)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

Given the scarcity of linguistic resources available for spoken varieties, this paper explores the use of gamified approaches for verifying Part of Speech (PoS) tagging of spoken dialectal Spanish. The article presents the first results of the accuracy study carried out with the developed gamified methodologies and investigates whether player participation and inter-annotator agreement measures can improve the dataset. The study indicates that educational level, field of study, and geographic upbringing significantly influence human annotators’ performance in PoS tagging. Specifically, the results suggest that, as expected, Ph.D. holders have the highest mean accuracy, while, surprisingly, Master’s degree holders have the lowest. Annotators with a background in Arts and Humanities show the highest mean accuracy, while those with a background in Natural Sciences display the lowest. Moreover, players who grew up in Spain have the highest mean accuracy, followed by those from Germany and Latin America. Finally, the verified region in the PoS tagging tasks also has a significant impact on annotator performance, with some regions exhibiting a higher mean accuracy than others, though the unbalanced distribution of players per region could have introduced biases in this result, with regions with a higher number of verifications providing a more accurate representation of annotator performance.