EmDComF

Publication type
Publication status: Published
Author: Debaene, F
Publisher: Hugging Face
View in Biblio

Abstract

Raw text extraction of 466 early modern Dutch comedies and farces after nltk sentence tokenization, with author indications. Gold data comes from DBNL and CENETON, OCR data is post-corrected after finetuning mBART on floriandebaene/EmDComF_OCR_post-correction on Hugging Face.

Latest news

Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie
Oct. 6, 2025	PhD Defense Aaron
Oct. 2, 2025	Tekom Belgium at the LT3 offices
Sept. 29, 2025	Francesca at ICLC 11

More news