针对道路检测模型易受光线及阴影影响而导致精度不高及道路边缘分割不准确的问题,提出一种基于Transformer和卷积神经网络模型混合且以RGB图像和三维激光雷达点云共同为输入的道路分割算法,实现了无人车在自动驾驶过程中对所在行驶道路的精确感知。在KITTI道路数据集上的实验结果表明:与现有的道路检测模型相比,本文方法在分割精度方面具有较好的性能。
To address the problem of low accuracy and inaccurate road edge segmentation caused by the susceptibility of road detection models to light and shadows, we propose a road segmentation algorithm based on a hybrid of Transformer and convolutional neural network models, utilizing RGB images and 3D LIDAR point clouds as inputs to enhance the precise perception of driving roads for autonomous vehicles. Experimental results on the KITTI road dataset demonstrate the superior segmentation accuracy of the proposed method compared with existing road detection models.
[1] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale [DB/OL]. 2021[2023-02-24]. http://arxiv.org/abs/2010.11929.
[2] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
[3] Oliveira G L, Burgardw, Brox T. Efficient deep methods for monocular road segmentation [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016: 89- 97.
[4] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [DB/OL]. 2015[2023-02-24]. http://arxiv.org/abs/1409.1556.
[5] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation [C]//18th International Conference on Medical Image Computing and ComputerAssisted Intervention, 2015: 234-241.
[6] Lyu Y C, Bai L, Huang X M. Road segmentation using CNN and distributed LSTM [C]//IEEE International Symposium on Circuits and Systems (ISCAS), 2019: 1-5.
[7] Hochreiter S, Schmidhuber J. Long short-term memory [J], Neural Computation, 1997, 9(8): 1735-1780,
[8] Sun J Y, Kim S W, Lee S W, et al. Reverse and boundary attention network for road segmentation [C]//IEEE/CVF International Conference on Computer Vision Workshops, 2019: 876-885.
[9] Teichmann M, Weber M, Zoellner M, et al. Multinet: real-time joint semantic reasoning for autonomous driving [C]//IEEE Intelligent Vehicles Symposium (IV), 2018: 1013-1020.
[10] Gu S, Yang J, Kong H. A cascaded lidar-camera fusion network for road detection [C]//IEEE International Conference on Robotics and Automation (ICRA), 2021: 13308-13314.
[11] Yu B, Lee D, Lee J S, et al. Free space detection using camera-LiDAR fusion in a bird’s eye view plane [J]. Sensors, 2021, 21(22): 7623.
[12] 张莹, 黄影平, 郭志阳, 等. 基于点云与图像交叉融合的道路分割方法[J]. 光电工程, 2021, 48(12): 21-28. Zhang Y, Huang Y P, Guo Z Y, et al. Point cloud-image data fusion for road segmentation [J]. Opto-Electronic Engineering, 2021, 48(12): 21-28. (in Chinese)
[13] Chang Y, Xue F, Sheng F, et al. Fast road segmentation via uncertainty-aware symmetric network [C]//International Conference on Robotics and Automation (ICRA), 2022: 11124-11130.
[14] Zheng S X, Lu J C, Zhao H S, et al. Rethinking semantic segmentation from a sequence-tosequence perspective with transformers [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 6881-6890.
[15] Strudel R, Garcia R, Laptev I, et al. Segmenter: transformer for semantic segmentation [C]//IEEE/CVF International Conference on Computer Vision, 2021: 7242-7252.
[16] Xie E, Wang W, Yu Z, et al. SegFormer: simple and efficient design for semantic segmentation with transformers [J]. Advances in Neural Information Processing Systems, 2021, 34: 12077- 12090.
[17] Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]//IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[18] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers [C]//16th European Conference on Computer Vision, 2020: 213-229.
[19] Wang W, Xie E, Li X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions [C]//IEEE/CVF International Conference on Computer Vision, 2021: 568-578.
[20] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[21] Stergiou A, Poppe R, Kalliatakis G. Refining activation downsampling with SoftPool [C]//IEEE/CVF International Conference on Computer Vision, 2021: 10357-10366.
[22] Kervadec H, Bouchtiba J, Desrosiers C, et al. Boundary loss for highly unbalanced segmentation [J]. Medical Image Analysis, 2021, 67: 101851
[23] Wang Y, Ma X, Chen Z, et al. Symmetric cross entropy for robust learning with noisy labels [C]//IEEE/CVF International Conference on Computer Vision, 2019: 322-330.