From Character to Word Level: Enabling the Linguistic Analyses of Inputlog Process Data

Publication type: C1
Publication status: Published
Authors: Leijten, M, Macken, L., Hoste, V., Van Horenbeeck, E, & Van Waes, L.
Journal: Proceedings of the EACL 2012 Workshop on Computational Linguistics and Writing
Pagination: 1-8
Publisher: Association for Computational Linguistics (Avignon, France)
External link: http://www.aclweb.org/anthology/W/W12/W12-0301.pdf
Download
Project: Inputlog++

Abstract

Keystroke-logging tools are widely used in writing process research. These applications are designed to capture each character and mouse movement as isolated events as an indicator of cognitive processes. The current research project explores the possibilities of aggregating the logged process data from the letter level (keystroke) to the word level by merging them with existing lexica and using NLP tools. Linking writing process data to lexica and using NLP tools enables researchers to analyze the data on a higher, more complex level. In this project the output data of Inputlog are segmented on the sentence level and then tokenized. However, by definition writing process data do not always represent clean and grammatical text. Coping with this problem was one of the main challenges in the current project. Therefore, a parser has been developed that extracts three types of data from the S-notation: word-level revisions, deleted fragments, and the final writing product. The within-word typing errors are identified and excluded from further analyses. At this stage the Inputlog process data are enriched with the following linguistic information: part-of-speech tags, lemmas, chunks, syllable boundaries and word frequencies.

June 8, 2026	20 years of LT3
May 31, 2026	PhD Defense Quanqi Du
May 20, 2026	📢 PhD Position
Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie