Lane detection technology plays a crucial role in autonomous driving systems. Currently, deep learning-based methods for lane detection typically involve extracting features from a backbone network, followed by confidence estimation of key points on the lane lines and their offsets relative to a starting point. However, existing backbone networks struggle to effectively capture features of elongated lanes, and offset networks face challenges in regressing the offsets of key points along the lane line. In this paper, we propose a hybrid network model called CTNet (CNN-Transformer hybrid network) based on a pointbased lane detection approach. CTNet enhances feature representation through a feature pyramid network and an augmented coordinate attention mechanism. Additionally, it employs a vision transformer-based offset network to regress crucial offsets. Consequently, CTNet extracts elongated lane line features, captures long-range offsets between points, and significantly improves the accuracy of lane detection. Experiments conducted on the TuSimple and CULane datasets demonstrate that CTNet outperforms six commonly used lane detection algorithms across various accuracy metrics. Specifically, CTNet achieves superior results on TuSimple across all evaluation metrics. Furthermore, when tested across nine different lane scenarios in the CULane dataset, CTNet achieves the highest accuracy in six scenarios.
TANG Hong, DENG Feng, ZHANG Kai, NIE Xuefang, LI Guanghui
. Lane Line Detection Based on CNN and Transformer Hybrid Network[J]. Journal of Applied Sciences, 2024
, 42(5)
: 871
-883
.
DOI: 10.3969/j.issn.0255-8297.2024.05.013
[1] Pan X, Shi J, Luo P, et al. Spatial as deep: spatial CNN for traffic scene understanding [DB/OL]. 2017[2022-11-08]. https://arxiv.org/abs/1712.06080.
[2] Hou Y N, Ma Z, Liu C X, et al. Learning lightweight lane detection CNNs by self attention distillation [C]//IEEE/CVF International Conference on Computer Vision (ICCV), 2019: 1013-1021.
[3] Tabelini L, Berriel R, Paixao T M, et al. Keep your eyes on the lane: real-time attentionguided lane detection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 294-302.
[4] Wang J S, Ma Y C, Huang S F, et al. A keypoint-based global association network for lane detection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 1382-1391.
[5] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [C]//The 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[6] Qu Z, Jin H, Zhou Y, et al. Focus on local: detecting lane marker from bottom up via key point [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14117-14125.
[7] Munir F, Azam S, Jeon M, et al. LDNet: end-to-end lane marking detection approach using a dynamic vision sensor [J].IEEE Transactions on Intelligent Transportation Systems, 2022, 23: 9318-9334.
[8] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[9] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936-944.
[10] Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13708-13717.
[11] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale [DB/OL]. 2020[2022-12-01]. http://arxiv.org/abs/2010.11929.
[12] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[13] Park J, Woo S, Lee J Y, et al. BAM: bottleneck attention module [DB/OL]. 2018[2022-11-08]. http://arxiv.org/abs/1807.06514.
[14] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module [C]//European Conference on Computer Vision, 2018: 3-19.
[15] Sandler M, Howard A, Zhu M L, et al. MobileNetV2: inverted residuals and linear bottlenecks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
[16] Wu H P, Xiao B, Codella N, et al. CvT: introducing convolutions to vision transformers [C]//IEEE/CVF International Conference on Computer Vision, 2021: 22-31.
[17] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection [C]//IEEE International Conference on Computer Vision (ICCV), 2017: 2999-3007.
[18] Qin Z Q, Wang H Y, Li X. Ultra fast structure-aware deep lane detection [C]//European Conference on Computer Vision, 2020: 276-291.