Metrics of syntactic equivalence to assess translation difficulty

Publication type
B2
Publication status
Published
Authors
Vanroy, B., De Clercq, O., Tezcan, A., Daems, J, & Macken, L.
Editor
Michael Carl
Series
Explorations in empirical translation process research
Volume
3
Pagination
259-294
Publisher
Springer
Download
(.pdf)
View in Biblio
(externe link)

Abstract

We propose three linguistically motivated metrics to quantify syntactic equivalence between a source sentence and its translation. Syntactically Aware Cross (SACr) measures the degree of word group reordering by creating syntactically motivated groups of words that are aligned. Secondly, an intuitive approach is to compare the linguistic labels of the word-aligned source and target tokens. Finally, on a deeper linguistic level, Aligned Syntactic Tree Edit Distance (ASTrED) compares the dependency structure of both sentences. To be able to compare source and target dependency labels we make use of Universal Dependencies (UD). We provide an analysis of our metrics by comparing them with translation process data in mixed models. Even though our examples and analysis focus on English as the source language and Dutch as the target language, the proposed metrics can be applied to any language for which UD models are attainable. An open-source implementation is made available.