Chen, Rongfei and Zhou, Wenju and Hu, Huosheng and Fei, Zixiang and Fei, Minrui and Zhou, Hao (2024) Disentangled Variational Auto-Encoder for Multimodal Fusion Performance Analysis in Multimodal Sentiment Analysis. Knowledge-Based Systems, 301. p. 112372. DOI https://doi.org/10.1016/j.knosys.2024.112372
Chen, Rongfei and Zhou, Wenju and Hu, Huosheng and Fei, Zixiang and Fei, Minrui and Zhou, Hao (2024) Disentangled Variational Auto-Encoder for Multimodal Fusion Performance Analysis in Multimodal Sentiment Analysis. Knowledge-Based Systems, 301. p. 112372. DOI https://doi.org/10.1016/j.knosys.2024.112372
Chen, Rongfei and Zhou, Wenju and Hu, Huosheng and Fei, Zixiang and Fei, Minrui and Zhou, Hao (2024) Disentangled Variational Auto-Encoder for Multimodal Fusion Performance Analysis in Multimodal Sentiment Analysis. Knowledge-Based Systems, 301. p. 112372. DOI https://doi.org/10.1016/j.knosys.2024.112372
Abstract
Multimodal Sentiment Analysis (MSA) holds extensive applicability owing to its capacity to analyze and interpret users' emotions, feelings, and perspectives by integrating complementary information from multiple modalities. However, inefficient and unbalanced cross-modal information fusion substantially undermines the accuracy and reliability of MSA models. Consequently, a critical challenge in the field now lies in effectively assessing the information integration capabilities of these models to ensure balanced and equitable processing of multimodal data. In this paper, a Disentanglement-based Variable Auto-Encoder (DVAE) is proposed for systematically assessing fusion performance and investigating the factors that facilitate multimodal fusion. Specifically, a distribution constraint module is presented to decouple the fusion matrices and generate multiple low-dimensional and trustworthy disentangled latent vectors that adhere to the authentic unimodal input distribution. In addition, a combined loss term is modified to effectively balance inductive bias, signal reconstruction, and distribution constraint items to facilitate the optimization of neural network weights and parameters. Utilizing the proposed evaluation method, we can evaluate the fusion performance of multimodal models by contrasting the classification degradation ratio derived from disentangled hidden representations and joint representations. Experiments conducted with eight state-of-the-art multimodal fusion methods on the CMU-MOSEI and CMU-MOSEI benchmark datasets demonstrate that DVAE is capable of effectively evaluating the effects of multimodal fusion. Moreover, the comparative experimental results indicate that the equalizing effect among various advanced mechanisms in multimodal sentiment analysis, as well as the single-peak characteristic of the ground label distribution, both contribute significantly to multimodal data fusion.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Multimodal sentiment analysis; Model performance evaluation; Disentangled representation learning |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 15 Aug 2024 10:50 |
Last Modified: | 30 Oct 2024 21:18 |
URI: | http://repository.essex.ac.uk/id/eprint/38976 |
Available files
Filename: Accepted_Manuscript.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
Embargo Date: 10 August 2025