Jin, Xiaofang and Xiao, Jieyu and Jin, Libiao and Zhang, Xinruo (2024) Residual multimodal Transformer for expression‐EEG fusion continuous emotion recognition. CAAI Transactions on Intelligence Technology, 9 (5). pp. 1290-1304. DOI https://doi.org/10.1049/cit2.12346
Jin, Xiaofang and Xiao, Jieyu and Jin, Libiao and Zhang, Xinruo (2024) Residual multimodal Transformer for expression‐EEG fusion continuous emotion recognition. CAAI Transactions on Intelligence Technology, 9 (5). pp. 1290-1304. DOI https://doi.org/10.1049/cit2.12346
Jin, Xiaofang and Xiao, Jieyu and Jin, Libiao and Zhang, Xinruo (2024) Residual multimodal Transformer for expression‐EEG fusion continuous emotion recognition. CAAI Transactions on Intelligence Technology, 9 (5). pp. 1290-1304. DOI https://doi.org/10.1049/cit2.12346
Abstract
Continuous emotion recognition is to predict emotion states through affective information and more focus on the continuous variation of emotion. Fusion of electroencephalography (EEG) and facial expressions videos has been used in this field, while there are with some limitations in current researches, such as hand‐engineered features, simple approaches to integration. Hence, a new continuous emotion recognition model is proposed based on the fusion of EEG and facial expressions videos named residual multimodal Transformer (RMMT). Firstly, the Resnet50 and temporal convolutional network (TCN) are utilised to extract spatiotemporal features from videos, and the TCN is also applied to process the computed EEG frequency power to acquire spatiotemporal features of EEG. Then, a multimodal Transformer is used to fuse the spatiotemporal features from the two modalities. Furthermore, a residual connection is introduced to fuse shallow features with deep features which is verified to be effective for continuous emotion recognition through experiments. Inspired by knowledge distillation, the authors incorporate feature‐level loss into the loss function to further enhance the network performance. Experimental results show that the RMMT reaches a superior performance over other methods for the MAHNOB‐HCI dataset. Ablation studies on the residual connection and loss function in the RMMT demonstrate that both of them is functional.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | facial expression recognition, human‐machine interaction, information fusion, physiology, regression analysis |
Divisions: | Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 14 Oct 2025 11:48 |
Last Modified: | 14 Oct 2025 11:50 |
URI: | http://repository.essex.ac.uk/id/eprint/41623 |
Available files
Filename: CAAI Trans on Intel Tech - 2024 - Jin - Residual multimodal Transformer for expression‐EEG fusion continuous emotion.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0