Bantu Corpus Linguistics and Lexicography

Gilles-Maurice de Schryver
Target group
MA in African Studies, MA in Advanced Studies in Linguistics (main subject Linguistics in a Comparative Perspective), etc.

About Bantu Corpus Linguistics and Lexicography


Bantu languages, corpus linguistics, lexicography, data-driven language technology

Position of the course

This is an advanced course in which corpus linguistics for the Bantu languages as well as data-driven lexicography is introduced. Students acquire an in-depth knowledge of the methodologies, the tools, as well as the strengths and limitations of the analytical apparatus, and are invited to put the acquired knowledge into practice.


Corpus linguistics is a booming field, well covered in both basic textbooks (e.g. Biber et al. 1998, Kennedy 1998, McEnery & Wilson 2001) as well as in more advanced studies (e.g. McEnery et al. 2006, Renouf & Kehoe 2009, McEnery & Hardie 2012), which has largely been used to investigate the world’s major languages (e.g. Sinclair 1991, Meyer 2002, O'Keeffe & McCarthy 2010). However, even in collections which aim at sampling the world’s languages, the result deals largely with European languages (e.g. Wilson et al. 2006). The same can, mutatis mutandis, be said about data-driven lexicography. In this advanced course this status quo is challenged by focusing on corpus linguistics and lexicography for the Bantu language family. Given the state of the discipline, results from the wider field of African language technology (De Pauw et al. 2006-16) are also brought in.

Each class consists of two parts. In the first a topic form the list below is considered, and in the second the newly acquired knowledge is immediately put into practice on the computer.

  • Class 1 – Bantu CORPORA: What, how and use? (de Schryver & Prinsloo 2000a, de Schryver 2002)
  • Class 2 – SOFTWARE for Bantu corpus linguistics and data-driven lexicography (Scott 1996-2016, Joffe 2002-16, de Schryver & De Pauw 2007)
  • Class 3 – Bantu corpus APPLICATIONS: Fundamental research, teaching and language learning (Prinsloo & de Schryver 2001)
  • Class 4 – Bantu SPELLCHECKERS: non-word error detection (Prinsloo & de Schryver 2003)
  • Class 5 – Bantu corpus TERMINOGRAPHY (Taljard & de Schryver 2002)
  • Class 6 – Bantu corpus-based TRANSLATION studies (Gauton et al. 2003)
  • Class 7 – Bantu corpus LEXICOGRAPHY 1: Basic aspects (de Schryver & Prinsloo 2000b, 2000c)
  • Class 8 – Bantu corpus LEXICOGRAPHY 2: Advanced aspects (de Schryver & Joffe 2004, de Schryver et al. 2006)
  • Class 9 – Bantu corpus LINGUISTICS 1: Synchronic aspects (de Schryver & Nabirye 2010)
  • Class 10 – Bantu corpus LINGUISTICS 2: Diachronic aspects (de Schryver & Gauton 2002)
  • Class 11 - Bantu corpus LINGUISTICS 3: Strengths (Kawalya et al. 2014)
  • Class 12 – Bantu corpus LINGUISTICS 4: Limitations (Bostoen & de Schryver 2015)

Through state-of-the-art literature students are able to familiarize themselves with the early stages, development, and current use of data-driven methods in Bantu linguistics and lexicography. The differences with other approaches to data collection, data analysis and data synthesis (including questionnaires, stimuli, grammaticality judgement tests, introspection and intuition) are also given due attention. Knowledge of Bantu languages or other African languages is an advantage, but not a prerequisite to follow this course. Each week students are asked to read one or more journal articles or book chapters as preparation for the lesson. The contents of these articles and chapters are discussed during class and put in a broader perspective.