The iron(ic) melting pot : reviewing human evaluation in humour, irony and sarcasm generation

Publication type: P1
Publication status: Published
Authors: Loakman, T., Maladry, A, & Lin, C.
Editor: Houda Bouamor, Juan Pino and Kalika Bali
Series: Findings of the Association for Computational Linguistics : EMNLP 2023
Pagination: 6676-6689
Publisher: Association for Computational Linguistics
Conference: 2023 Conference on Empirical Methods in Natural Language Processing Singapore (EMNLP 2023) (Singapore)
Download
View in Biblio

Abstract

Human evaluation in often considered to be the gold standard method of evaluating a Natural Language Generation system. However, whilst its importance is accepted by the community at large, the quality of its execution is often brought into question. In this position paper, we argue that the generation of more esoteric forms of language - humour, irony and sarcasm - constitutes a subdomain where the characteristics of selected evaluator panels are of utmost importance, and every effort should be made to report demographic characteristics wherever possible, in the interest of transparency and replicability. We support these claims with an overview of each language form and an analysis of examples in terms of how their interpretation is affected by different participant variables. We additionally perform a critical survey of recent works in NLG to assess how well evaluation procedures are reported in this subdomain, and note a severe lack of open reporting of evaluator demographic information, and a significant reliance on crowdsourcing platforms for recruitment.

July 17, 2025	Summer Teambuilding
July 10, 2025	LT3 at EST 2025
July 4, 2025	LT3 at MT Summit and ICWSM 2025
June 27, 2025	Workshop CALM Work Placements
June 12, 2025	LT3 at LTRC, ICTIC, NITS and DHBenelux