The lexical demands of English and French L2 textbooks: A cross-lingual corpus study

Publication type: U
Publication status: Published
Authors: Van Parys, A, De Wilde, V., Macken, L., & Montero Perez, M.
Publisher: LinGhentian Doctorials (Ghent, Belgium)
Conference: 6th LinGhentian Doctorials (Ghent, Belgium)
Download
View in Biblio

Abstract

Textbooks are a vital source of input in the L2 classroom. Studies have determined the lexical complexity of a variety of input types (e.g., novels, audio-visual media) by calculating lexical profiles (e.g., Nation, 2006; Webb, 2010), i.e., estimates of the distribution of words across frequency levels – the assumption being that higher word frequencies equal lower demands. However, lexical profiling research into L2 textbooks is limited. Moreover, the few existing studies tend to focus exclusively on English, ignoring languages to which learners may have considerably lower out-of-school exposure (e.g., French; cf. Peters et al., 2019). To address these gaps, this cross-linguistic corpus-driven study investigates both English and French L2 textbooks and aims to determine (RQ1) what the lexical profiles are of the reading materials found within these textbooks, (RQ2) how these demands evolve across secondary education and (RQ3) how the approach differs based on L2 (English-French).
A corpus of approximately 200,000 tokens per L2 was compiled by selecting the reading texts from 36 Flemish secondary school L2 textbooks. To determine the vocabulary demands (cf. RQ1), a custom Python script was developed that creates a lexical profile for each text by categorising the words into existing word frequency lists. A crucial decision in the lexical profiling process is the choice of lexical unit. Typically, lexical profiles are reported in terms of word families (i.e., lexical units encompassing all inflections and derivations of a headword, e.g., 'depend', 'depends' and 'dependable' are part of the same family), but recent research has shown that these may overestimate the vocabulary knowledge of learners who struggle with morphology (e.g., Brown et al., 2020). Moreover, we argue that word families are especially unsuitable for French, considering its additional morphological challenges when compared to English. For instance, a learner may know the meaning of the infinitive 'résoudre', but not of the rather different inflected form 'résolvons'. To give a cross-linguistic insight that is as complete as possible, two other lexical units are explored: the word type (i.e., each token counted separately, e.g., 'depend' and 'depends' are distinct units) and the lemma (i.e., a headword and all its inflections, e.g. 'depend' and 'depends' fall under the same lemma '(to) depend'). Our lexical profiles are based on the subtitle-based frequency lists Subtlex-UK (van Heuven et al., 2014) for English and Lexique (New et al., 2004) for French. The profiles are supplemented with measures of lexical density (i.e., the ratio of content words to the total number of words) and lexical diversity (determined using the Measure of Textual Lexical Diversity). To determine the evolution of these different features across grade levels as well as the ways in which they differ across English and French (cf. RQ2 and RQ3), multilevel regression modelling and pairwise comparisons between grades and L2s will be performed.
This study is ongoing and entering the analysis phase. In my presentation, preliminary results will be discussed, with a special focus on the methodological decisions that needed to be made. Pedagogical implications for text selection in L2 teaching will be addressed.

July 10, 2025	LT3 at EST 2025
July 4, 2025	LT3 at MT Summit and ICWSM 2025
June 27, 2025	Workshop CALM Work Placements
June 12, 2025	LT3 at LTRC, ICTIC, NITS and DHBenelux
June 5, 2025	Podcast Episode Dwars Door de Klas