MariATE: Automatic Term Extraction Using Large Language Models in the Maritime Domain

Publication type
U
Publication status
In press
Authors
Liu, S, Lefever, E., & Hoste, V.
Conference
International Conference Recent Advances in Natural Language Processing (RANLP) (Varna, Bulgaria)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

This study presents a comprehensive evaluation of Large Language Models (LLMs) for automatic term extraction in the maritime safety domain. The research examines the zero-shot performance of seven state-of-the-art LLMs, including both open-source and closed-source models, and investigates terminology annotation strategies for optimal coverage. Nested annotation captures both complete technical expressions and their constituent components, while full-term annotation focuses exclusively on maximal-length terms. Experimental results demonstrate Claude-3.5-Sonnet's superior performance (F1-score of 0.80) in maritime safety terminology extraction, particularly in boundary detection capabilities. Error analysis reveals three primary challenges: distinguishing contextual descriptions from legitimate terminology, handling complex multi-word expressions, and identifying maritime safety operational and navigational terms. Analysis of annotation strategies reveals that the full-term annotation approach achieves 95.24% coverage of unique terms compared to the nested annotation approach. The additional 4.76% of terms identified through nested annotation represents subcomponents of larger technical expressions. These findings advance the understanding of LLMs' capabilities in specialized terminology extraction and provide empirical evidence supporting the sufficiency of full-term annotation for comprehensive terminology coverage in domain-specific applications.