Lu, Qiang and Sun, Xia and Long, Yunfei and Zhao, Xiaodi and Wang, Zou and Feng, Jun and Wang, Xuxin (2024) Multimodal Dual Perception Fusion Framework for Multimodal Affective Analysis. Information Fusion, 115. p. 102747. DOI https://doi.org/10.1016/j.inffus.2024.102747
Lu, Qiang and Sun, Xia and Long, Yunfei and Zhao, Xiaodi and Wang, Zou and Feng, Jun and Wang, Xuxin (2024) Multimodal Dual Perception Fusion Framework for Multimodal Affective Analysis. Information Fusion, 115. p. 102747. DOI https://doi.org/10.1016/j.inffus.2024.102747
Lu, Qiang and Sun, Xia and Long, Yunfei and Zhao, Xiaodi and Wang, Zou and Feng, Jun and Wang, Xuxin (2024) Multimodal Dual Perception Fusion Framework for Multimodal Affective Analysis. Information Fusion, 115. p. 102747. DOI https://doi.org/10.1016/j.inffus.2024.102747
Abstract
The misuse of social platforms and the difficulty in regulating post contents have culminated in a surge of negative sentiments, sarcasms, and the rampant spread of fake news. In response, Multimodal sentiment analysis, sarcasm detection and fake news detection based on image and text have attracted considerable attention recently. Due to that these areas share semantic and sentiment features and confront related fusion challenges in deciphering complex human expressions across different modalities, integrating these multimodal classification tasks that share commonalities across different scenarios into a unified framework is expected to simplify research in sentiment analysis, and enhance the effectiveness of classification tasks involving both semantic and sentiment modeling. Therefore, we consider integral components of a broader spectrum of research known as multimodal affective analysis towards semantics and sentiment, and propose a novel multimodal dual perception fusion framework (MDPF). Specifically, MDPF contains three core procedures: 1) Generating bootstrapping language-image Knowledge to enrich origin modality space, and utilizing cross-modal contrastive learning for aligning text and image modalities to understand underlying semantics and interactions. 2) Designing dynamic connective mechanism to adaptively match image-text pairs and jointly employing gaussian-weighted distribution to intensify semantic sequences. 3) Constructing a cross-modal graph to preserve the structured information of both image and text data and share information between modalities, while introducing sentiment knowledge to refine the edge weights of the graph to capture cross-modal sentiment interaction. We evaluate MDPF on three publicly available datasets across three tasks, and the empirical results demonstrate the superiority of our proposed model.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | fake news detection; multimodal affective analysis; multimodal dual perception fusion; multimodal sentiment analysis; sarcasm detection |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 16 Oct 2024 13:42 |
Last Modified: | 27 Nov 2024 19:08 |
URI: | http://repository.essex.ac.uk/id/eprint/39412 |
Available files
Filename: MDPF.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
Embargo Date: 22 October 2025