Lu, Qiang and Long, Yunfei and Sun, Xia and Feng, Jun and Zhang, Hao (2024) Fact-sentiment Incongruity Combination Network for Multimodal Sarcasm Detection. Information Fusion, 104 (104). p. 102203. DOI https://doi.org/10.1016/j.inffus.2023.102203
Lu, Qiang and Long, Yunfei and Sun, Xia and Feng, Jun and Zhang, Hao (2024) Fact-sentiment Incongruity Combination Network for Multimodal Sarcasm Detection. Information Fusion, 104 (104). p. 102203. DOI https://doi.org/10.1016/j.inffus.2023.102203
Lu, Qiang and Long, Yunfei and Sun, Xia and Feng, Jun and Zhang, Hao (2024) Fact-sentiment Incongruity Combination Network for Multimodal Sarcasm Detection. Information Fusion, 104 (104). p. 102203. DOI https://doi.org/10.1016/j.inffus.2023.102203
Abstract
Multimodal sarcasm detection aims to identify whether the literal expression is contrary to the authentic attitude within multimodal data. Sarcasm incongruity method has been successfully applied to multimodal sarcasm detection, due to its ability to flexibly capture the intrinsic differences between modalities. However, previous incongruity methods primarily focused on the semantic level, often overlooking more specific forms of sarcasm incongruity. Sarcasm incongruity, in particular, encompasses fact incongruity, sentiment incongruity, and combination incongruity. Therefore, we propose a fact-sentiment incongruity combination network from a novel perspective, which draws the multimodal sarcastic relations by exploring the multimodal factual disparities, sentiment incongruity, and combination fusion. Specifically, we design a dynamic connecting component calculating dynamic routing probability weights via graph attention and mask routing matrices, which selects the most suitable image-text pairs to capture fact incongruity between images and text. Then, we retrieve sentiment relations between text tokens and image objects using external sentiment knowledge to reconstruct edge weights in the cross-modal graph matrix to capture sentiment incongruity. Furthermore, we introduce a combination incongruity fusion layer and cross-modal contrastive loss to fuse fact incongruity and sentiment incongruity for further enhancing the incongruity representations. Extensive experiments and further analyses on publicly available datasets demonstrate the superiority of our proposed model.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Combination incongruity fusion; Cross-modal graph; Dynamic connecting component; Multimodal sarcasm detection; Sarcasm incongruity |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 09 Jan 2024 17:18 |
Last Modified: | 30 Oct 2024 21:13 |
URI: | http://repository.essex.ac.uk/id/eprint/37290 |
Available files
Filename: FSICN.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
Embargo Date: 20 June 2025