Chen, Hongmei and Chang, Suhong and Ye, Wen and Gu, Dongbing and Xu, Qiangwei and Ji, Miaoxin (2026) A Lightweight Multimodal Fusion Method for Object Detection Based on Bird’s Eye View. IEEE Sensors Journal, 26 (2). pp. 2626-2638. DOI https://doi.org/10.1109/jsen.2025.3639093
Chen, Hongmei and Chang, Suhong and Ye, Wen and Gu, Dongbing and Xu, Qiangwei and Ji, Miaoxin (2026) A Lightweight Multimodal Fusion Method for Object Detection Based on Bird’s Eye View. IEEE Sensors Journal, 26 (2). pp. 2626-2638. DOI https://doi.org/10.1109/jsen.2025.3639093
Chen, Hongmei and Chang, Suhong and Ye, Wen and Gu, Dongbing and Xu, Qiangwei and Ji, Miaoxin (2026) A Lightweight Multimodal Fusion Method for Object Detection Based on Bird’s Eye View. IEEE Sensors Journal, 26 (2). pp. 2626-2638. DOI https://doi.org/10.1109/jsen.2025.3639093
Abstract
This article proposes a lightweight multimodal fusion object detection algorithm based on bird’s eye view (BEV) perception, addressing the challenges of high computational complexity and insufficient feature fusion when integrating camera and light detection and ranging (LiDAR) data in autonomous vehicles. To alleviate the computational burden, depthwise separable convolution and large separable kernel attention (LSKA) are introduced, constructing a lightweight feature extraction network that effectively captures features from camera data. To enhance computational efficiency, a parallel-optimized BEV pooling structure is proposed that improves the computation process and memory access patterns. In the feature fusion stage, a novel dualmodality dual-attention feature fusion module is designed that integrates the features from both modalities using parallel channel attention and spatial attention mechanisms to strengthen correlations between multimodal features. In addition, a weight generation network is introduced to adaptively assign fusion weights to features of the two modalities. The proposed algorithm has a lightweight structure and achieves cross-modal feature alignment in the BEV space. The experimental results of the public nuScenes dataset show that the proposed algorithm achieves an mAP of 0.682 and a normalized discounted success (NDS) of 0.710, while maintaining detection accuracy similar to the baseline model. Furthermore, the algorithm reduces the computational load from 253.2 to 202.3 G MACs, approximately a 20% decrease, and improves the inference speed from 8.4 to 10.0 FPS. Furthermore, our algorithm has also been experimentally validated on theWaymo dataset, demonstrating performance comparable to that of the BEVFusion method. These results demonstrate significant improvements in both computational efficiency and deployment-friendliness while preserving detection performance.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Attention mechanism; bird’s eye view (BEV); light weighting; multimodal fusion; object detection |
| Subjects: | Z Bibliography. Library Science. Information Resources > ZR Rights Retention |
| Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
| SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
| Depositing User: | Unnamed user with email elements@essex.ac.uk |
| Date Deposited: | 23 Jan 2026 14:32 |
| Last Modified: | 23 Jan 2026 14:36 |
| URI: | http://repository.essex.ac.uk/id/eprint/42307 |
Available files
Filename: A_Lightweight_Multimodal_Fusion_Method_for_Object_Detection_Based_on_Bird_s_Eye_View.pdf
Licence: Creative Commons: Attribution 4.0