Aligning machines and minds: Neural encoding for high-level visual cortices based on image captioning task

Yin, Xu and Jiuchuan, Jiang and Ge, Sheng and Gan, John Qiang and Wang, Haixian (2025) Aligning machines and minds: Neural encoding for high-level visual cortices based on image captioning task. Journal of Neural Engineering, 22 (5). 056035-056035. DOI https://doi.org/10.1088/1741-2552/ae1164

Abstract

Objective. Neural encoding of visual stimuli aims to predict brain responses in the visual cortex to different external inputs. Deep neural networks (DNNs) trained on relatively simple tasks such as image classification have been widely applied in neural encoding studies of early visual areas. However, due to the complex and abstract nature of semantic representations in high-level visual cortices, their encoding performance and interpretability remain limited. Approach. We propose a novel neural encoding model guided by the image captioning task (ICT). During image captioning, an attention module is employed to focus on key visual objects. In the neural encoding stage, a flexible receptive field (RF) module is designed to simulate voxel-level visual fields. To bridge the domain gap between these two processes, we introduce the Atten-RF module, which effectively aligns attention-guided visual representations with voxel-wise brain activity patterns. Main results. Experiments on the large-scale Natural Scenes Dataset (NSD) demonstrate that our method achieves superior average encoding performance across seven high-level visual cortices, with a mean squared error (MSE) of 0.765, Pearson correlation coefficient (PCC) of 0.443, and coefficient of determination (R²) of 0.245. Significance. By leveraging the guidance and alignment provided by a complex vision-language task, our model enhances the prediction of voxel activity in high-level visual cortex, offering a new perspective on the neural encoding problem. Furthermore, various visualization techniques provide deeper insights into the neural mechanisms underlying visual information processing.

Item Metadata

Item Type:	Article
Uncontrolled Keywords:	functional magnetic resonance imaging; deep neural network; neural encoding; image caption task; attention
Subjects:	Z Bibliography. Library Science. Information Resources > ZR Rights Retention
Divisions:	Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor:	Unnamed user with email elements@essex.ac.uk
Depositing User:	Unnamed user with email elements@essex.ac.uk
Date Deposited:	14 Oct 2025 08:59
Last Modified:	30 Oct 2025 03:03
URI:	http://repository.essex.ac.uk/id/eprint/41728

Available files

Accepted Version

Filename: Yin+et+al_2025_J._Neural_Eng._10.1088_1741-2552_ae1164.pdf

Licence: Creative Commons: Attribution 4.0

Download

Aligning machines and minds: Neural encoding for high-level visual cortices based on image captioning task

Abstract

Item Metadata

Share and export

Available files

Accepted Version

Statistics

Altmetrics

Downloads