Can Peter Pan survive MT? A stylometric study of LLMs, NMTs, and HTs in children's literature translation

Publication type
C1
Publication status
Published
Authors
Kong, D, & Macken, L.
Editor
Bram Vanroy, Marie-Aude Lefer, Lieve Macken, Paola Ruffo, Ana Guerberof Arenas and Damien Hansen
Series
Proceedings of the Second Workshop on Creative-text Translation and Technology (CTT)
Pagination
52-70
Publisher
European Association for Machine Translation (EAMT)
Conference
Second Workshop on Creative-text Translation and Technology (CTT) (Geneva, Switzerland)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

This study focuses on evaluating the performance of machine translations (MTs) compared to human translations (HTs) in children’s literature translation (CLT) from a stylometric perspective. The research constructs a extitPeter Pan corpus, comprising 21 translations: 7 human translations (HTs), 7 large language model translations (LLMs), and 7 neural machine translation outputs (NMTs). The analysis employs a generic feature set (including lexical, syntactic, readability, and n-gram features) and a creative text translation (CTT-specific) feature set, which captures repetition, rhyme, translatability, and miscellaneous levels, yielding 447 linguistic features in total. Using classification and clustering techniques in machine learning, we conduct a stylometric analysis of these translations. Results reveal that in generic features, HTs and MTs exhibit significant differences in conjunction word distributions and the ratio of 1-word-gram-一样, while NMTs and LLMs show significant variation in descriptive words usage and adverb ratios. Regarding CTT-specific features, LLMs outperform NMTs in distribution, aligning more closely with HTs in stylistic characteristics, demonstrating the potential of LLMs in CLT.