Vocabulary knowledge is a crucial building block for text comprehension (Schmitt et al., 2011). As such, a widely recommended step for structuring learning materials is to rely on lexical profiling (Sun & Dang, 2020). Lexical profiling involves categorizing tokens from a text or a corpus across levels in a frequency list, which are typically structured in bands of 1,000 word families. This allows to calculate vocabulary loads, i.e., the number of word frequency levels required to reach points of text coverage crucial for text comprehension. The most frequently reported coverage levels are 95 and 98% of text tokens, as these have been shown to be necessary for basic and optimal text comprehension, respectively (Webb, 2021). The higher a text’s vocabulary loads, the more demanding its vocabulary supposedly is. Thus, lexical profiling provides education professionals and researchers with an objective method for assessing the lexical challenge of texts. This study presents LexPro, a new plurilingual lexical profiling tool with three main aims: (1) integrate recent findings from vocabulary research (specifically related to word frequency and word counting units), (2) allow analysis in four languages (English, French, Spanish, and Dutch) in light of the relatively few available tools aimed at audiences working with non-English L2s, and (3) be user-friendly considering the limited use of lexical profilers in vocabulary instruction today (Dang & Webb, 2020).
LexPro categorises tokens across frequency levels provided by subtitle-based lists, as these have been shown to be more predictive of learner knowledge than frequencies derived from general corpora (Pinchbeck et al., 2022; van Heuven et al., 2014). Although the traditional counting unit in profiling research is the word family (which encompasses all inflections and derivations of a headword) , LexPro relies on the flemma (which only includes inflections), as different studies have put forward this unit as more representative of learner knowledge (e.g., Brown et al., 2022; McLean, 2021). Output includes general text characteristics (e.g., text length, lexical diversity), a lexical profile with accompanying visuals, and a full overview of the used vocabulary, including insights into the items contributing to the bottom 5% coverage. LexPro can be used for analysing both single texts and batches of texts.
To illustrate the potential applications of the tool for both research and teaching practice, we report on its usage in an analysis of a corpus containing English and French L2 textbook materials (ca. 600,000 tokens). The texts were analysed in batches corresponding to the grade level for which they are intended (grades 1-6 of secondary education) and vocabulary loads at 95 and 98% were assessed. The English materials show a fairly systematic increase in loads across target levels in line with recommendations from the field (e.g., Schmitt & Schmitt, 2014), whereas loads in the French materials follow a fluctuating trajectory. Practical recommendations for implementing LexPro in text selection processes (e.g., textbook development, comparing texts, etc.) will be discussed.