基于CNN和Transformer点云图像融合的道路检测

华怡坦, 黄影平, 过文昊

doi:10.3969/j.issn.0255-8297.2024.04.011

应用科学学报 >

2024 , Vol. 42 >Issue 4: 695 - 708

DOI: https://doi.org/10.3969/j.issn.0255-8297.2024.04.011

信号与信息处理

基于CNN和Transformer点云图像融合的道路检测

展开

上海理工大学光电信息与计算机工程学院, 上海 200093

收稿日期: 2023-02-24

网络出版日期: 2024-08-01

基金资助

国家自然科学基金（No.62276167）；上海市自然科学基金（No.20ZR1437900）资助

收起

Fusion of Point-Cloud and Image for Road Segmentation Using CNN and Transformer

Expand

School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

Received date: 2023-02-24

Online published: 2024-08-01

Fold

摘要

针对道路检测模型易受光线及阴影影响而导致精度不高及道路边缘分割不准确的问题，提出一种基于Transformer和卷积神经网络模型混合且以RGB图像和三维激光雷达点云共同为输入的道路分割算法，实现了无人车在自动驾驶过程中对所在行驶道路的精确感知。在KITTI道路数据集上的实验结果表明：与现有的道路检测模型相比，本文方法在分割精度方面具有较好的性能。

关键词： 道路检测; 语义分割; 数据融合; Transformer

本文引用格式

华怡坦, 黄影平, 过文昊 . 基于CNN和Transformer点云图像融合的道路检测[J]. 应用科学学报, 2024 , 42(4) : 695 -708 . DOI: 10.3969/j.issn.0255-8297.2024.04.011

Abstract

To address the problem of low accuracy and inaccurate road edge segmentation caused by the susceptibility of road detection models to light and shadows, we propose a road segmentation algorithm based on a hybrid of Transformer and convolutional neural network models, utilizing RGB images and 3D LIDAR point clouds as inputs to enhance the precise perception of driving roads for autonomous vehicles. Experimental results on the KITTI road dataset demonstrate the superior segmentation accuracy of the proposed method compared with existing road detection models.

Key words： road detection; semantic segmentation; data fusion; Transformer

参考文献

[1] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale [DB/OL]. 2021[2023-02-24]. http://arxiv.org/abs/2010.11929.
[2] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
[3] Oliveira G L, Burgardw, Brox T. Efficient deep methods for monocular road segmentation [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016: 89- 97.
[4] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [DB/OL]. 2015[2023-02-24]. http://arxiv.org/abs/1409.1556.
[5] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation [C]//18th International Conference on Medical Image Computing and ComputerAssisted Intervention, 2015: 234-241.
[6] Lyu Y C, Bai L, Huang X M. Road segmentation using CNN and distributed LSTM [C]//IEEE International Symposium on Circuits and Systems (ISCAS), 2019: 1-5.
[7] Hochreiter S, Schmidhuber J. Long short-term memory [J], Neural Computation, 1997, 9(8): 1735-1780,
[8] Sun J Y, Kim S W, Lee S W, et al. Reverse and boundary attention network for road segmentation [C]//IEEE/CVF International Conference on Computer Vision Workshops, 2019: 876-885.
[9] Teichmann M, Weber M, Zoellner M, et al. Multinet: real-time joint semantic reasoning for autonomous driving [C]//IEEE Intelligent Vehicles Symposium (IV), 2018: 1013-1020.
[10] Gu S, Yang J, Kong H. A cascaded lidar-camera fusion network for road detection [C]//IEEE International Conference on Robotics and Automation (ICRA), 2021: 13308-13314.
[11] Yu B, Lee D, Lee J S, et al. Free space detection using camera-LiDAR fusion in a bird’s eye view plane [J]. Sensors, 2021, 21(22): 7623.
[12] 张莹, 黄影平, 郭志阳, 等. 基于点云与图像交叉融合的道路分割方法[J]. 光电工程, 2021, 48(12): 21-28. Zhang Y, Huang Y P, Guo Z Y, et al. Point cloud-image data fusion for road segmentation [J]. Opto-Electronic Engineering, 2021, 48(12): 21-28. (in Chinese)
[13] Chang Y, Xue F, Sheng F, et al. Fast road segmentation via uncertainty-aware symmetric network [C]//International Conference on Robotics and Automation (ICRA), 2022: 11124-11130.
[14] Zheng S X, Lu J C, Zhao H S, et al. Rethinking semantic segmentation from a sequence-tosequence perspective with transformers [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 6881-6890.
[15] Strudel R, Garcia R, Laptev I, et al. Segmenter: transformer for semantic segmentation [C]//IEEE/CVF International Conference on Computer Vision, 2021: 7242-7252.
[16] Xie E, Wang W, Yu Z, et al. SegFormer: simple and efficient design for semantic segmentation with transformers [J]. Advances in Neural Information Processing Systems, 2021, 34: 12077- 12090.
[17] Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]//IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[18] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers [C]//16th European Conference on Computer Vision, 2020: 213-229.
[19] Wang W, Xie E, Li X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions [C]//IEEE/CVF International Conference on Computer Vision, 2021: 568-578.
[20] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[21] Stergiou A, Poppe R, Kalliatakis G. Refining activation downsampling with SoftPool [C]//IEEE/CVF International Conference on Computer Vision, 2021: 10357-10366.
[22] Kervadec H, Bouchtiba J, Desrosiers C, et al. Boundary loss for highly unbalanced segmentation [J]. Medical Image Analysis, 2021, 67: 101851
[23] Wang Y, Ma X, Chen Z, et al. Symmetric cross entropy for robust learning with noisy labels [C]//IEEE/CVF International Conference on Computer Vision, 2019: 322-330.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献