Embedding analogies for evaluating emotion in LLM-generated utterances

Publication type
C1
Publication status
Published
Authors
Jafari, Sadegh, Lefever, E., & Hoste, V.
Editor
Myra Spiliopoulou, Sławomir Nowaczyk Marco Ragni Jerzy Stefanowski, Marco Ragni and Jerzy Stefanowski
Series
Proceedings of the 2025 Workshop on AI for Understanding Human Behavior in Professional Settings (BEHAIV 2025) co-located with 28th European Conference on Artificial Intelligence (ECAI 2025)
Pagination
26-41
Conference
2025 Workshop on 'AI for understanding human behavior in professional settings' (BEHAIV) at the 28th European Conference on Artificial Intelligence (ECAI 2025) (Bologna, Italy)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

Emotion plays a vital role in human communication, shaping not only language but also vocal tone, facial expression, and body posture. In the context of emotionally expressive text generation, the lack of reliable evaluation metrics still remains a key challenge. This paper introduces a two-step evaluation framework using embedding analogy-based metrics to assess the emotional expressiveness of large language models (LLMs). In the first step, we evaluate the model’s ability to neutralize emotional content from a given text while preserving its semantic meaning. In the second step, we test the model’s capacity to reinject the intended emotion back into the neutralized text. Our experiments demonstrate that GPT-4.1 outperforms other models in both semantic retention and emotional reconstruction, while llama-3.3-70b-instruct performs best among open-source models. This work lays the foundation for future research on cross-modal affective computing, aiming to build emotionally intelligent agents capable of nuanced and empathetic communication across text, speech, and video.