EmDComF

Publication type
Publication status
Published
Author
Debaene, F
Publisher
Hugging Face
View in Biblio
(externe link)

Abstract

Raw text extraction of 466 early modern Dutch comedies and farces after nltk sentence tokenization, with author indications. Gold data comes from DBNL and CENETON, OCR data is post-corrected after finetuning mBART on floriandebaene/EmDComF_OCR_post-correction on Hugging Face.