Explicit modality weighting with unimodal supervision for categorical multimodal emotion recognition

Publication type: A1
Publication status: In press
Authors: Du, Q, De Langhe, L., Lefever, E., & Hoste, V.
Journal: IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
Download
View in Biblio

Abstract

Categorical multimodal emotion recognition (MER) aims to infer discrete emotional states by integrating heterogeneous signals such as text, speech, and visual expressions. A persistent challenge in this setting lies in handling cross-modal inconsistency, where different modalities convey divergent emotional cues. While existing MER models often rely on attention mechanisms or implicit interaction layers to address this issue, modality contributions are typically learned in an opaque manner and are rarely directly supervised. Leveraging the unique annotation scheme of the UniC dataset, which provides parallel unimodal and multimodal categorical emotion labels, this paper presents an investigation of how unimodal emotion supervision can be explicitly incorporated into multimodal learning. We examine representative late fusion and tensor fusion strategies and propose an explicit, per-sample modality weighting framework built upon multitask tensor fusion. The proposed method derives modality importance from unimodal–multimodal label disagreement during training and learns to predict modality weights at inference time without relying on attention mechanisms. Experiments on the UniC dataset demonstrate that explicit modality weighting consistently improves performance and stability over strong multimodal baselines, achieving the highest average accuracy under seven-class emotion recognition. Additional evaluations on the CH-SIMS dataset further confirm the generalisability of the approach. Beyond performance gains, the weighting design enables modality reliance analysis, offering interpretable insights into emotion-specific modality dependencies. This study provides one of the earliest systematic investigations into supervised modality weighting for robust and interpretable categorical MER.

June 8, 2026	20 years of LT3
May 31, 2026	PhD Defense Quanqi Du
May 20, 2026	📢 PhD Position
Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie