An, Yi and Shi, Jin and Gu, Dongbing and Liu, Qiang (2022) Visual-LiDAR SLAM Based on Unsupervised Multi-channel Deep Neural Networks. Cognitive Computation, 14 (4). pp. 1496-1508. DOI https://doi.org/10.1007/s12559-022-10010-w
An, Yi and Shi, Jin and Gu, Dongbing and Liu, Qiang (2022) Visual-LiDAR SLAM Based on Unsupervised Multi-channel Deep Neural Networks. Cognitive Computation, 14 (4). pp. 1496-1508. DOI https://doi.org/10.1007/s12559-022-10010-w
An, Yi and Shi, Jin and Gu, Dongbing and Liu, Qiang (2022) Visual-LiDAR SLAM Based on Unsupervised Multi-channel Deep Neural Networks. Cognitive Computation, 14 (4). pp. 1496-1508. DOI https://doi.org/10.1007/s12559-022-10010-w
Abstract
Recently, deep learning techniques have been applied to solve visual or light detection and ranging (LiDAR) simultaneous localization and mapping (SLAM) problems. Supervised deep learning SLAM methods need ground truth data for training, but collecting such data is costly and labour-intensive. Unsupervised training strategies have been adopted by some visual or LiDAR SLAM methods. However, these methods only exploit the potential of single-sensor modalities, which do not take the complementary advantages of LiDAR and visual data. In this paper, we propose a novel unsupervised multi-channel visual-LiDAR SLAM method (MVL-SLAM) which can fuse visual and LiDAR data together. Our SLAM system consists of an unsupervised multi-channel visual-LiDAR odometry (MVLO) component, a deep learning–based loop closure detection component, and a 3D mapping component. The visual-LiDAR odometry component adopts a multi-channel recurrent convolutional neural network (RCNN). Its input consists of front, left, and right view depth images generated from 360 ∘ 3D LiDAR data and RGB images. We use the features from a deep convolutional neural network (CNN) for the loop closure detection component. Our SLAM method does not require ground truth data for training and can directly construct environmental 3D maps from the 3D mapping component. Experiments conducted on the KITTI odometry dataset have shown the rotation and translation errors are lower than some of the other unsupervised methods, including UnMono, SfmLearner, DeepSLAM, and UnDeepVO. Experimental results show that our methods have good performance. By fusing visual and LiDAR data, MVL-SLAM has higher accuracy and robustness of the pose estimation compared with other single-modal SLAM systems.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Unsupervised deep learning; Multi-channel RCNN; Visual-LiDAR SLAM; Sensor fusion |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 29 Sep 2022 16:20 |
Last Modified: | 30 Oct 2024 15:51 |
URI: | http://repository.essex.ac.uk/id/eprint/33580 |
Available files
Filename: s12559-022-10010-w.pdf
Licence: Creative Commons: Attribution 3.0