为了实现快速和自动的车辆外观检测,提出一种基于深度学习的车检图像多目标检测与识别方法。首先,采用轻量级神经网络YOLOv3实现车检图像中车头、轮胎、车牌及三角形标志的检测与识别;其次,采用多任务级联卷积神经网络实现车牌4个关键点定位;再次,利用车牌4个关键点坐标,结合目标车牌图像高宽先验,通过透视变换对车牌进行校正;最后,设计卷积神经网络实现车牌底色分类,同时设计卷积循环神经网络,实现车牌字符识别。实验结果表明,在816×612的车检图像上,该方法中端到端的多目标检测与识别的平均精度达98.03%;为便于在车检场景下应用该模型,利用阿里巴巴推理引擎将模型部署到CPU端,使多目标检测与识别的平均速度达10帧/s,从而满足车检的应用需求。
A multi-target detection and recognition method of vehicle inspection images based on deep learning is proposed for faster and more automatic vehicle inspection. Firstly, a lightweight yolov3 network is used to detect and recognize vehicle head, tires, license plate and triangle marks in a vehicle inspection image; secondly, a multi-task cascade convolution neural network is used to locate the four key points of the license plate; thirdly, according to the four key point coordinates and the size prior of the target license plate, the license plate image is corrected by perspective transformation; finally, a convolutional neural network is designed to classify the background color of the license plate. Thus, a convolutional recurrent neural network is realized for license plate character recognition. Experimental results show that the average end-to-end multi-target detection and recognition accuracy of this method is 98.03% on an 816×612 car inspection image. To facilitate the application of the deep learning model in vehicle inspection scenes, the model is deployed to a CPU using Alibaba reasoning engine, and the average speed of multi-target detection and recognition reaches 10 frames per second, which meets the application requirements of vehicle inspection.
[1] Wen Y, Lu Y, Yan J, et al. An algorithm for license plate recognition applied to intelligent transportation system[J]. IEEE Transactions on Intelligent Transportation Systems, 2011, 12(3):830-845.
[2] Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7):971-987.
[3] Bay H, Ess A, Tuytelaars T, et al. Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008, 110(3):346-359.
[4] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//2001 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2001). Kauai, HI, United states:IEEE, 2001:511-518.
[5] Qiang Z, Mei-chen Y, Kwang-ting C, et al. Fast human detection using a cascade of histograms of oriented gradients[C]//2006 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2006). New York:IEEE, 2006:1491-1498.
[6] 魏亭, 邱实, 李晨, 等. 计算机多尺度辅助定位车牌算法[J]. 电子学报, 2018, 46(9):2188-2193. Wei T, Qiu S, Li C, et al. License plate algorithm based on computer multi-scale assist[J]. Acta Electronica Sinica, 2018, 46(9):2188-2193. (in Chinese)
[7] 张国云, 向灿群, 吴健辉, 等. 基于改进BP网络的车牌字符识别方法研究[J]. 计算机应用与软件, 2017, 34(4):243-248. Zhang G Y, Xiang C Q, Wu J H, et al. Research on license plate character recognition method based on improved BP neural network[J]. Computer Applications and Software, 2017, 34(4):243-248. (in Chinese)
[8] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[9] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2014:580-587.
[10] Girshick R. Fast R-CNN[C]//IEEE International Conference on Computer Vision. Santiago:IEEE, 2015:1440-1448.
[11] Ren S, He K, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems, 2015:91-99.
[12] Liu W, Anguelov D, Erhan D, et al. SSD:single shot multibox detector[C]//European Conference on Computer Vision. Cham:Springer, 2016:21-37.
[13] Redmon J, Divvala S, Girshick R, et al. You only look once:unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas:IEEE, 2016:779-788.
[14] 史建伟, 章韵. 基于改进YOLOv3和BGRU的车牌识别系统[J]. 计算机工程与设计, 2020, 41(8):2345-2351. Shi J W, Zhang Y. License plate recognition system based on improved YOLOv3 and BGRU[J]. Computer Engineering and Design, 2020, 41(8):2345-2351. (in Chinese)
[15] 吕石磊, 卢思华, 李震, 等. 基于改进YOLOv3-LITE轻量级神经网络的柑橘识别方法[J]. 农业工程学报, 2019, 35(17):205-214. Lü S L, Lu S H, Li Z, et al. Orange recognition method using improved YOLOv3-LITE lightweight neural network[J]. Transactions of the Chinese Society of Agricultural Engineering, 2019, 35(17):205-214. (in Chinese)
[16] Zhang K P, Zhang Z P, Li Z F, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10):1499-1503.
[17] Shi B G, Bai X, Yao C, et al. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11):2298-2304.
[18] 王孟轩, 张胜, 王月, 等. 改进的C-RNN模型在情文本分类中的研究与应用[J]. 应用科学学报, 2020, 38(3):388-400. Wang M X, Zhang S, Wang Y, et al. Research and application of improved C-RNN model in classification of alarm texts[J]. Journal of Applied Sciences, 2020, 38(3):388-400. (in Chinese)
[19] Sandler M, Howard A, Zhu M, et al. Mobilenetv2:inverted residuals and linear bottlenecks[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018:4510-4520.
[20] Redmon J, Farhadi A. YOLOv3:an incremental improvement[J/OL]. Computer Vision and Pattern Recognition[2020-12-28]. https://arxiv.org/abs/1804.02767.
[21] Jiang X T, Wang H, Chen Y L, et al. MNN:a universal and efficient inference engine[J/OL]//IEEE Computer Vision and Pattern Recognition[2020-12-28]. https://www.researchgate.net/publication/339616189.
[22] Zherzdev S, Gruzdev A. LPRNet:license plate recognition via deep neural networks[J/OL]. Computer Vision and Pattern Recognition[2020-12-28]. https://arxiv.org/abs/1806.10447v1.
[23] Xu Z, Yang W, Meng A, et al. Towards end-to-end license plate detection and recognition:a large dataset and baseline[C]//Proceedings of the European Conference on Computer Vision, 2018:255-271.