An Automated Clarity Tool for Dutch and English Discourse

Start date: Jan. 1, 2009
End date: Oct. 31, 2012
Sponsor: Ghent University College Research Fund
Research portal: http://research.flw.ugent.be/projects/hendi

About Hendi

In a society that constantly communicates through writing and conversation, clarity of documents is of crucial importance. Technical, governmental, medical and other documents must be understandable. But how clear are Dutch and English texts today? And how clear are texts that originate from a multilingual context? The Hendi project revolves around those research questions. Clarity is primarily a monolingual property. A reader or listener is interested in the clarity of the text itself, whatever its origin. Further, many texts emerge from a multilingual setting. In translated as well as interpreted discourse, a balance between adequacy and clarity must be found. A translation or interpretation must be adequate, but the intended audience also wants it to be clear. The Hendi project has two main goals: first of all, to create an automatic application that assigns a clarity score to a previously unknown input document. The score will be given by using lexical, syntactic and pragmatic information extracted from the text. The application will then be used to compare texts in different settings. How does the clarity of a source text relate to the clarity of its translations or transcripted interpretations? And what about the clarity of comparable texts, e.g. two newspaper articles about the same subject?

Data sets

Data sets used in the publications are available in zip and tar.gz formats. Please read the terms of use and disclaimers in the README before using the data sets.