EmDComF_raw

Publication type
Publication status: Published
Author: Debaene, F
Publisher: Hugging Face
View in Biblio

Abstract

Raw text extraction of 466 early modern Dutch comedies and farces after nltk sentence tokenization, with author indications. Gold data comes from DBNL and CENETON, OCR data is the raw output of Google Books scans by Transkribus Print M1.

Latest news

Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie
Oct. 6, 2025	PhD Defense Aaron
Oct. 2, 2025	Tekom Belgium at the LT3 offices
Sept. 29, 2025	Francesca at ICLC 11

More news