TRIC : a confidence-aware multilingual dataset for irony detection and rationales in English, Dutch and Italian

Publication type: V
Publication status: Published
Authors: Maladry, A, Cignarella, ATC, Van Hee, C., Lefever, E., & Hoste, V.
Pagination: 1-29
View in Biblio

Abstract

This article presents the novel manually annotated Trilingual Recognition of Irony with Confidence (TRIC) dataset for English, Dutch and Italian, publicly available on Hugging Face as Amala3/TRIC. The annotations for this dataset include irony likelihood labels, indicating how likely the annotators believe a text is ironic, as well as trigger words, indicating which words in a sentence are essential for understanding the irony. In addition to the dataset, this work investigates the development of confidence-aware models for irony detection in a monolingual and multilingual setup. Results show that finetuning encoder-only models with confidence-aware labels improves the performance on binary irony detection and that finetuning on task-specific data in multiple languages also results in increased performance. Comparison to finetuned Llama3 indicates that generative decoder-only models perform better than confidence-aware models for English, but that encoder-only models perform best for less-resourced languages (both Dutch and Italian). Analysis of trigger words of both humans and automatic systems suggests that token-level importance differ significantly, but that n-gram based clustering can reveal deeper insights. In all three languages,
automatic systems tend to rely more on hyperbolic positive sentiment and interjections, whereas humans more often identify topics that are relevant to understand irony.

July 10, 2025	LT3 at EST 2025
July 4, 2025	LT3 at MT Summit and ICWSM 2025
June 27, 2025	Workshop CALM Work Placements
June 12, 2025	LT3 at LTRC, ICTIC, NITS and DHBenelux
June 5, 2025	Podcast Episode Dwars Door de Klas