A quarter century of Swahili corpus building (1999-2024)

Publication type
U
Publication status
Published
Author
de Schryver, G-M
Series
Presentation at the start-up meeting for the RJ-funded project ‘Modality in Swahili – Variation, Change and Transfer’
Publisher
Faculty of Humanities, Gothenburg University (Gothenburg)
View in Biblio
(externe link)

Abstract

== 1. My way … == 1999: Kiswahili Internet Corpus (KIC), version 1 [1.03m, synchronic] *** 2000: Kiswahili Internet Corpus (KIC), version 2 [2.7m, synchronic] *** 2001 – 2003: Kiswahili Internet Corpus (KIC), version 3 [+ 2.4m, synchronic] *** 2003: Students’ AIDS/HIV Swahili corpora [+110k, LSP] *** 2004: TshwaneDJe Swahili Corpus (TSC), version 1 [14.8m, synchronic] *** 2008: Sawa (a parallel English-Swahili corpus), version 1 [SW 443k // EN 542k, parallel] *** 2011: Sawa (a parallel English-Swahili corpus), version 2 [SW 1.5m // EN 1.2m, parallel] *** 2013: Swahili Literature Corpus, version 1 [1.4m, regional (diachronic)] *** 2014: Swahili Literature Corpus, version 2 [2.1m, diachronic (regional)] *** 2016: TshwaneDJe Swahili Corpus (TSC), version 2 [22m, hybrid] *** 2019: Swahili Linguistics Corpus [400k, LSP] == 2. The other way … == *** Arvi Hurskainen: Helsinki Corpus of Swahili (HCS) *** Sketch Engine: swWaC (Swahili corpus from the web) == 3. The third way … == Hybrid (share, exchange, cooperate)