A Posteriori Agreement as a Quality Measure for Readability Prediction Systems

Publication type: P1
Publication status: Published
Authors: van Oosten, P., Hoste, V., & Tanghe, D.
Editor: Alexander Gelbukh
Journal: Computational Linguistics and Intelligent Text Processing
Series: Lecture Notes in Computer Science
Volume: 6609
Pagination: 424-435
Publisher: Springer-Verlag (Tokyo, Japan)
External link: http://dx.doi.org/10.1007/978-3-642-19437-5_35
Download
Project: Hendi

Abstract

All readability research is ultimately concerned with the research question whether it is possible for a prediction system to automatically determine the level of readability of an unseen text. A significant problem for such a system is that readability might depend in part on the reader. If different readers assess the readability of texts in fundamentally different ways, there is insufficient a priori agreement to justify the correctness of a readability prediction system based on the texts assessed by those readers. We built a data set of readability assessments by expert readers. We clustered the experts into groups with greater a priori agreement and then measured for each group whether classifiers trained only on data from this group exhibited a classification bias. As this was found to be the case, the classification mechanism cannot be unproblematically generalized to a different user group.

June 27, 2025	Workshop CALM Work Placements
June 12, 2025	LT3 at LTRC, ICTIC, NITS and DHBenelux
June 5, 2025	Podcast Episode Dwars Door de Klas
June 3, 2025	PhD Defense Margot 🎓
May 30, 2025	The road towards fine-tuned LLMs for lexicography