Aligning linguistically motivated phrases

Publication type: C1
Publication status: Published
Authors: Macken, L., & Daelemans, W.
Editor: S. Verberne, H. van Halteren, and P. Coppen
Journal: Computational Linguistics in the Netherlands 2007: selected papers from the eighteenth CLIN meeting
Series: LOT Occasional series
Volume: 11
Pagination: 37-52
Publisher: Netherlands Graduate School of Linguistics (Nijmegen, The Netherlands)
Download

Abstract

In this paper, we describe the architecture of a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. We conceive our sub-sentential aligner as a cascade model consisting of two phases. In the ﬁrst phase, anchor chunks are linked on the basis of lexical correspondences and syntactic similarity. In the second phase, we will focus on the more complex translational correspondences based on observed translation shift patterns. The anchor chunks of the ﬁrst phase will be used to limit the search space in the second phase. We present the ﬁrst results of our sub-sentential alignment system, which links linguistically motivated chunks. In our baseline system, the obtained recall scores range from 44% to 59% and precision scores from 90% to 98% depending on text type. We experimented with two different types of bilingual dictionaries to generate the lexical correspondences: a handcrafted bilingual dictionary and probabilistic bilingual dictionaries. We demonstrate that although the handcrafted dictionary is twice the size of the probabilistic dictionary, the obtained recall scores are lower.

June 8, 2026	20 years of LT3
May 31, 2026	PhD Defense Quanqi Du
May 20, 2026	📢 PhD Position
Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie