A Posteriori Agreement as a Quality Measure for Readability Prediction Systems

Publication type
P1
Publication status
Published
Authors
van Oosten, P., Hoste, V., & Tanghe, D.
Editor
Alexander Gelbukh
Journal
Computational Linguistics and Intelligent Text Processing
Series
Lecture Notes in Computer Science
Volume
6609
Pagination
424-435
Publisher
Springer-Verlag (Tokyo, Japan)
External link
http://dx.doi.org/10.1007/978-3-642-19437-5_35
Download
(.pdf)
Project
Hendi

Abstract

All readability research is ultimately concerned with the research question whether it is possible for a prediction system to automatically determine the level of readability of an unseen text. A significant problem for such a system is that readability might depend in part on the reader. If different readers assess the readability of texts in fundamentally different ways, there is insufficient a priori agreement to justify the correctness of a readability prediction system based on the texts assessed by those readers. We built a data set of readability assessments by expert readers. We clustered the experts into groups with greater a priori agreement and then measured for each group whether classifiers trained only on data from this group exhibited a classification bias. As this was found to be the case, the classification mechanism cannot be unproblematically generalized to a different user group.