Align MacridVAE : multimodal alignment for disentangled recommendations

Publication type
P1
Publication status
Published
Authors
Avas, I., Allein, L, Laenen, K., & Moens, M.
Editor
Nazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald and Iadh Ounis
Series
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I
Volume
14608
Pagination
73-89
Publisher
Springer (Cham)
Conference
46th European Conference on Information Retrieval (ECIR 2024) (Glasgow, UK)
View in Biblio
(externe link)

Abstract

Explaining why items are recommended to users is challenging, especially when these items are described by multimodal data. Most recommendation systems fail to leverage more than one modality, preferring textual or tabular data. In this work, a new model, Align MacridVAE, that considers the complementarity of visual and textual item descriptions for item recommendation is proposed. This model projects both modalities onto a shared latent space, and a dedicated loss function aligns the text and image of the same item. The aspects of the item are then jointly disentangled for both modalities at a macro level to learn interpretable categorical information about items and at a micro level to model user preferences on each of those categories. Experiments are conducted on six item recommendation datasets, and recommendation performance is compared against multiple baseline methods. The results demonstrate that our model increases recommendation accuracy by 18% in terms of NCDG on average in the studied datasets and allows us to visualise user preference by item aspect across modalities and the learned concept allocation (The code implementation is available at https:// github.com/igui/Align-MacridVAE).