Northern Sotho Part-of-Speech Tagger

Developers
Guy De Pauw and Gilles-Maurice de Schryver
Website
https://demos.aflat.org/

About Northern Sotho Part-of-Speech Tagger

This demos our data-driven part-of-speech tagging of Northern Sotho. It is based on the following article.

Abstract
In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as such the encouraging outcomes of this study are far-reaching.