ShAnEL-2 : a multilingual benchmarking dataset for short-answer language learning exercises

Publication type: C1
Publication status: Published
Authors: Degraeuwe, JRD, & Moerman, T.M.
Editor: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek and Antonio Toral
Series: Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Pagination: 6764-6771
Publisher: European Language Resources Association (ELRA)
Conference: Language Resources and Evaluation Conference 2026 (LREC2026) (Palma de Mallorca, Spain)
Download
View in Biblio

Abstract

Before using GenAI models as EdTech tools, their pedagogical suitability should be corroborated. In this paper, we present ShAnEL-2, a novel multilingual dataset comprising 1,185 student responses to short-answer language learning exercises corrected by teachers. We use ShAnEL-2 to establish an initial benchmark of (1) "off-the-shelf" GenAI models and (2) retrieval-augmented generation (RAG) techniques for the automated correction of this exercise type. With an overall accuracy of 90% and recall of 95%, few-shot RAG (which adds previously corrected responses to the prompt) outperforms the off-the-shelf baseline and textbook RAG setup (which adds coursebook materials) by up to 7 (accuracy) and 5 (recall) percentage points. These results confirm that LLMs learn better from examples than from analysing context and highlight GenAI's particular potential as a correction assistant for teachers.

June 8, 2026	20 years of LT3
May 31, 2026	PhD Defense Quanqi Du
May 20, 2026	📢 PhD Position
Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie