Methodological advances in lexical pattern extraction : examples from Spanish adventure tourism

Publication type
B2
Publication status
Published
Authors
Goethals, P., & Degraeuwe, JRD
Editor
Isabel Durán-Muñoz and Eva Lucía Jiménez-Navarro
Series
Exploring the language of adventure tourism : a corpus-assisted approach
Volume
24
Pagination
85-110
Publisher
Peter Lang (Berlin)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

This chapter aims to contribute to the methodological innovation in the description of language use for specific purposes and the language of adventure tourism in particular. We describe techniques such as dependency parsing and semantic similarity calculation based on non-contextual word embeddings and transformer-based language models in order to show how these recent innovations or developments can lead to a more fruitful use of small and mid-size corpora, which are the typical use cases when studying languages for specific purposes. In these corpora, purely quantitative filters set too strict limitations on less frequently recurrent constructions. At the same time, we discuss the challenges posed by the new techniques, since they require adequate fine-tuning. Semantic similarity calculation in particular seems to offer important opportunities to explore the full richness of small and mid-size corpora such as the ADVENCOR corpus, since it takes recurrent constructions as a starting point and enriches these data by identifying semantically similar constructions which by themselves do not pass the frequency thresholds. We will apply these methodological insights to verb constructions that are typical in the adventure tourism discourse.