Unmanned aerial vehicle (UAV) images have characteristics of high resolution, large field of vision and small target. However, existing object detection methods are generally insufficient in extracting the features of these small targets. Aiming at this problem, a small target detection algorithm is proposed in this paper. First, in order to improve the ability of feature expression for small targets, CenterNet, a detection network which uses center points to represent small targets, is adopted, and a deformable dual attention mechanism is induced. Then on this basis, for the problem of deficiency of original nonmaximum suppression (NMS) in dealing with nested redundant frames, we propose to use a generalized non-maximum suppression (G-NMS) in the process of redundancy detection elimination. Finally, LegoNet convolution unit is introduced to reduce convolution parameters and achieve balance between precision and velocity. The main validation data sets used in this paper are Visdrone 2019 and UAV_ OUC. Images in UAV_OUC have higher resolution than those in VisDrone2019. Compared with CenterNet, the detection accuracies of UAV_OUC and VisDrone2019 are improved by about 10% and 2% respectively.
WANG Shengke, REN Pengfei, Lü Xin, ZHUANG Xinfa
. Small Target Detection Algorithm of UAV High Resolution Image Based on Center Point and Dual Attention Mechanism[J]. Journal of Applied Sciences, 2021
, 39(4)
: 650
-659
.
DOI: 10.3969/j.issn.0255-8297.2021.04.012
[1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems, 2012:1097-1105.
[2] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2014:580-587.
[3] Uijlings J R R, van de Sande K E A. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171.
[4] Vapnik V. Statistical learning theory[J]. Annals of the Institute of Statistical Mathematics, 2003, 55(2):371-389.
[5] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 37(9):1904-1916.
[6] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015:1440-1448.
[7] Ren S Q, He K M, Girshick R B, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 39(6):1137-1149.
[8] Redmon J, Divvala S, Girshick R, et al. You only look once:unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016:779-788.
[9] Liu W, Anguelov D, Erhan D, et al. SSD:single shot multibox detector[C]//European Conference on Computer Vision.[S.l.]:Springer International Publishing, 2016:21-37.
[10] Redmon J, Farhadi A. YOLO9000:better, faster, stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017:7263-7271.
[11] Redmon J, Farhadi A. YOLOv3:an incremental improvement[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018:2767-2773.
[12] 景献厅. 面向小型无人机的小目标识别技术研究[D]. 郑州:郑州大学, 2019.
[13] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO:common objects in context[C]//European Conference on Computer Vision.[S.l.]:Springer International Publishing, 2014:740-755.
[14] Law H, Deng J. CornerNet:detecting objects as paired keypoints[C]//European Conference on Computer Vision, 2018:734-750.
[15] Duan K, Bai S, Xie L, et al. CenterNet:keypoint triplets for object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2019:6569-6578.
[16] Zhou X, Wang D, Krähenbühl P. Objects as points[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2019:7850-7858.
[17] Hosang J, Benenson R, Schiele B. A ConvNet for non-maximum suppression[J]. Lecture Notes in Computer Science Book Series, 2015:192-204.
[18] Zhu X Z, Hu H, Lin S, et al. Deformable ConvNets v2:more deformable, better results[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018:9308-9316.
[19] Dai J F, Qi H Z, Xiong Y W, et al. Deformable convolutional networks[C]//IEEE International Conference on Computer Vision, 2017:764-773.
[20] Woo S, Park J, Lee J Y, et al. CBAM:convolutional block attention module[C]//European Conference on Computer Vision, 2018:3-19.
[21] Yang Z H, Wang Y H, Liu C J, et al. LegoNet:efficient convolutional neural networks with Lego filters[J]. Proceedings of the 36th International Conference on Machine Learning, 2019, (97):7005-7014.
[22] Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized intersection over union:a metric and a loss for bounding box regression[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2019:658-666.