计算机科学与应用

基于CNN和Transformer混合网络模型的车道线检测

展开
  • 1. 华东交通大学 信息与软件工程学院, 江西 南昌 330013;
    2. 江西省交通科学研究院有限公司, 江西 南昌 330038

收稿日期: 2022-11-08

  网络出版日期: 2024-09-29

基金资助

国家自然科学基金(No.52062016);江西省03专项(No.20203ABC03W07);江西省自然科学基金面上项目(No.20212BAB202009);江西省自然科学基金(No.20212BAB202004);江西省教育厅科学基金(No.GJJ190319)资助

Lane Line Detection Based on CNN and Transformer Hybrid Network

Expand
  • 1. School of Information and Software Engineering, East China Jiaotong University, Nanchang 330013, Jiangxi, China;
    2. Jiangxi Transportation Institute Co., Ltd., Nanchang 330038, Jiangxi, China

Received date: 2022-11-08

  Online published: 2024-09-29

摘要

车道线检测技术在自动驾驶系统中发挥着重要作用,目前基于深度学习的车道线检测方法通常在主干网络提取特征之后分别获取车道线关键点的置信度以及这些点相对车道线起始点的偏移。但由于车道线是细长结构,现有的主干网络无法有效提取这种结构特征,偏移网络也难以回归车道线上关键点相对起始点的偏移。鉴于注意力机制在提取空间结构特征、表征长距离图像序列间依赖关系方面的优越性能,在基于点的车道线检测方法的基础上提出了一种基于卷积神经网络(convolutional neural network, CNN)和Transformer的混合网络(CNN-Transformer hybrid network, CTNet)模型,该模型通过特征金字塔和增强的坐标注意力机制提高特征的表征能力,使用基于视觉Transformer的偏移网络回归关键点的偏移量,因此,CTNet能够提取细长车道线特征、捕获长距离点间的偏移,有效提升车道线检测的精度。实验对比了CTNet和6种常用车道线检测算法在数据集TuSimple和CULane上的效果,在TuSimple上CTNet各项精度指标均优于现有方法,在CULane数据集的9种不同车道场景中,CTNet在6个场景中取得了最佳精度。

本文引用格式

唐洪, 邓锋, 张恺, 聂学方, 李光辉 . 基于CNN和Transformer混合网络模型的车道线检测[J]. 应用科学学报, 2024 , 42(5) : 871 -883 . DOI: 10.3969/j.issn.0255-8297.2024.05.013

Abstract

Lane detection technology plays a crucial role in autonomous driving systems. Currently, deep learning-based methods for lane detection typically involve extracting features from a backbone network, followed by confidence estimation of key points on the lane lines and their offsets relative to a starting point. However, existing backbone networks struggle to effectively capture features of elongated lanes, and offset networks face challenges in regressing the offsets of key points along the lane line. In this paper, we propose a hybrid network model called CTNet (CNN-Transformer hybrid network) based on a pointbased lane detection approach. CTNet enhances feature representation through a feature pyramid network and an augmented coordinate attention mechanism. Additionally, it employs a vision transformer-based offset network to regress crucial offsets. Consequently, CTNet extracts elongated lane line features, captures long-range offsets between points, and significantly improves the accuracy of lane detection. Experiments conducted on the TuSimple and CULane datasets demonstrate that CTNet outperforms six commonly used lane detection algorithms across various accuracy metrics. Specifically, CTNet achieves superior results on TuSimple across all evaluation metrics. Furthermore, when tested across nine different lane scenarios in the CULane dataset, CTNet achieves the highest accuracy in six scenarios.

参考文献

[1] Pan X, Shi J, Luo P, et al. Spatial as deep: spatial CNN for traffic scene understanding [DB/OL]. 2017[2022-11-08]. https://arxiv.org/abs/1712.06080.
[2] Hou Y N, Ma Z, Liu C X, et al. Learning lightweight lane detection CNNs by self attention distillation [C]//IEEE/CVF International Conference on Computer Vision (ICCV), 2019: 1013-1021.
[3] Tabelini L, Berriel R, Paixao T M, et al. Keep your eyes on the lane: real-time attentionguided lane detection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 294-302.
[4] Wang J S, Ma Y C, Huang S F, et al. A keypoint-based global association network for lane detection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 1382-1391.
[5] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [C]//The 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[6] Qu Z, Jin H, Zhou Y, et al. Focus on local: detecting lane marker from bottom up via key point [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14117-14125.
[7] Munir F, Azam S, Jeon M, et al. LDNet: end-to-end lane marking detection approach using a dynamic vision sensor [J].IEEE Transactions on Intelligent Transportation Systems, 2022, 23: 9318-9334.
[8] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[9] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936-944.
[10] Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13708-13717.
[11] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale [DB/OL]. 2020[2022-12-01]. http://arxiv.org/abs/2010.11929.
[12] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[13] Park J, Woo S, Lee J Y, et al. BAM: bottleneck attention module [DB/OL]. 2018[2022-11-08]. http://arxiv.org/abs/1807.06514.
[14] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module [C]//European Conference on Computer Vision, 2018: 3-19.
[15] Sandler M, Howard A, Zhu M L, et al. MobileNetV2: inverted residuals and linear bottlenecks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
[16] Wu H P, Xiao B, Codella N, et al. CvT: introducing convolutions to vision transformers [C]//IEEE/CVF International Conference on Computer Vision, 2021: 22-31.
[17] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection [C]//IEEE International Conference on Computer Vision (ICCV), 2017: 2999-3007.
[18] Qin Z Q, Wang H Y, Li X. Ultra fast structure-aware deep lane detection [C]//European Conference on Computer Vision, 2020: 276-291.
文章导航

/