大区域场景下基于无人机视角的目标计数方法

doi:10.3969/j.issn.0255-8297.2024.01.006

摘要/Abstract

摘要： 近年来，无人机因其灵活度高、机动性强在人群计数领域得到广泛应用。然而，现有的人群计数方法大多基于单视点，对于大范围、多摄像机场景下的多视点计数研究较少。为了解决这个问题，提出了一种基于无人机视角的目标计数方法以准确统计场景中的目标数量。选择临海区域进行数据采集，利用深度学习技术对采集的图像进行目标检测和图像拼接融合，在拼接后的图像中映射检测信息，并采用计数算法完成区域场景的计数任务。在公开数据集和该文制作的数据集上进行的实验验证了基于目标检测的计数算法的有效性。

关键词: 无人机, 高分辨率图像, 目标检测, 图像拼接, 多视角目标计数

Abstract: In recent years, unmanned aerial vehicles (UAVs) have been widely used in the field of crowd counting due to their high flexibility and maneuverability. However, most of the existing crowd counting methods are based on single viewpoints, with limited studies focusing on multi-viewpoint counting in large-scale, multi-camera scenes. To solve this problem, this paper proposes a UAV-based target counting method which can accurately count the number of targets in a given scene. Specifically, this study selects a sea-front area for data acquisition, utilizes deep learning technology for target detection and image stitching fusion on the acquired images. The detection information is then mapped onto the spliced image, and a counting algorithm is employed to fulfill the counting task for the regional scene. The effectiveness of the counting algorithm based on target detection is validated through experiments conducted on both public dataset and the dataset produced in this paper.

Key words: unmanned aerial vehicle (UAV), high resolution images, object detection, image stitching, multi-view object counting

中图分类号:

TP391

谢婷, 张守龙, 丁来辉, 胥志伟, 杨晓刚, 王胜科. 大区域场景下基于无人机视角的目标计数方法[J]. 应用科学学报, 2024, 42(1): 67-82.

XIE Ting, ZHANG Shoulong, DING Laihui, XU Zhiwei, YANG Xiaogang, WANG Shengke. Target Counting Method Based on UAV View in Large Area Scenes[J]. Journal of Applied Sciences, 2024, 42(1): 67-82.

参考文献

[1] Zhu P F, Wen L Y, Bian X, et al. Vision meets drones: a challenge [DB/OL]. 2018[2023-06-29]. https://arxiv.org/abs/1804.07437.
[2] Lint Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 936-944.
[3] Liu S, Qi L, Qin H F, et al. Path aggregation network for instance segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[4] Bai Y, Zhang Y, Ding M, et al. SOD-MTGAN: small object detection via multi-task generative adversarial network [C]//European Conference on Computer Vision. Cham: Springer, 2018: 210-226.
[5] Li J N, Liang X D, Wei Y C, et al. Perceptual generative adversarial networks for small object detection [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 1951-1959.
[6] Lim J S, Astrid M, Yoon H, et al. Small object detection using context and attention [C]//International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 2021: 181-186.
[7] Kisantal M, Wojna Z, Murawski J, et al. Augmentation for small object detection [C]//The 9th International Conference on Advances in Computing and Information Technology (ACITY 2019), 2019: 1-15.
[8] Chen Y K, Zhang P Z, Li Z M, et al. Stitcher: feedback-driven data provider for object detection [DB/OL]. 2004[2023-06-29]. https://arxiv.org/abs/2004.12432.
[9] Levin A, Zomet A, Peleg S, et al. Seamless image stitching in the gradient domain [M]. Lecture Notes in Computer Science, 2004.
[10] Zaragoza J, Chin T J, Brown M S, et al. As-projective-as-possible image stitching with moving DLT [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2013: 2339-2346.
[11] Brown M, Lowe D G. Automatic panoramic image stitching using invariant features [J]. International Journal of Computer Vision, 2007, 74(1): 59-73.
[12] Chang C H, Sato Y, Chuang Y Y. Shape-preserving half-projective warps for image stitching [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2014: 3254-3261.
[13] Li J, Wang Z M, Lai S M, et al. Parallax-tolerant image stitching based on robust elastic warping [J]. IEEE Transactions on Multimedia, 2018, 20(7): 1672-1687.
[14] Xiang T Z, Xia G S, Bai X, et al. Image stitching by line-guided local warping with global similarity constraint [J]. Pattern Recognition, 2018, 83: 481-497.
[15] Li N, Xu Y F, Wang C. Quasi-homography warps in image stitching [J]. IEEE Transactions on Multimedia, 2018, 20(6): 1365-1375.
[16] Botterill T, Mills S, Green R. Real-time aerial image mosaicing [C]//The 25th International Conference of Image and Vision Computing, 2010: 1-8.
[17] Bu S H, Zhao Y, Wan G, et al. Map2DFusion: real-time incremental UAV image mosaicing based on monocular slam [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016: 4564-4571.
[18] Avola D, Foresti G L, Martinel N, et al. Real-time incremental and geo-referenced mosaicking by small-scale UAVs [C]//International Conference on Image Analysis and Processing, 2017: 694-705.
[19] Zhang F B, Yang T, Liu L F, et al. Image-only real-time incremental UAV image mosaic for multi-strip flight [J]. IEEE Transactions on Multimedia, 2021, 23: 1410-1425.
[20] Yuan Y T, Fang F M, Zhang G X. Superpixel-based seamless image stitching for UAV images [J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(2): 1565-1576.
[21] Xu Q, Chen J, Luo L B, et al. UAV image mosaicking based on multiregion guided local projection deformation [J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 3844-3855.
[22] Meng X Y, Wang W, Leong B. SkyStitch: a cooperative multi-UAV-based real-time video surveillance system with stitching [C]//The 23rd ACM International Conference on Multimedia, 2015: 261-270.
[23] Zheng J, Wang Y, Wang H Z, et al. A novel projective-consistent plane based image stitching method [J]. IEEE Transactions on Multimedia, 2019, 21(10): 2561-2575.
[24] Wang H, Li J, Wang L Y, et al. Automated mosaicking of UAV images based on SFM method [C]//IEEE Geoscience and Remote Sensing Symposium, 2014: 2633-2636.
[25] Zhou H, Zhou D X, Peng K J, et al. Seamless stitching of large area UAV images using modified camera matrix [C]//IEEE International Conference on Real-time Computing and Robotics (RCAR), 2016: 561-566.
[26] 季长清, 高志勇, 秦静, 等. 基于卷积神经网络的图像分类算法综述[J]. 计算机应用, 2022, 42(4): 1044-1049. Ji C Q, Gao Z Y, Qin J, et al. Review of image classification algorithms based on convolutional neural network [J]. Journal of Computer Applications, 2022, 42(4): 1044-1049.(in Chinese)
[27] Chan A B, Vasconcelos N. Bayesian poisson regression for crowd counting [C]//IEEE 12th International Conference on Computer Vision, 2010: 545-551.
[28] Ng P C, Henikoff S. SIFT: predicting amino acid changes that affect protein function [J]. Nucleic Acids Research, 2003, 31(13): 3812-3814.
[29] Dalal N, Triggs B. Histograms of oriented gradients for human detection [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893.
[30] Lempitsky V, Zisserman A. Learning to count objects in images [J]. Neural Information Processing Systems, 2010: 1-9.
[31] Zhang Y Y, Zhou D S, Chen S Q, et al. Single-image crowd counting via multi-column convolutional neural network [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 589-597.
[32] Bai S, He Z Q, Qiao Y, et al. Adaptive dilated network with self-correction supervision for counting [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 4593-4602.
[33] Choy C B, Xu D F, Gwak J, et al. 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction [C]//European Conference on Computer Vision. Cham: Springer, 2016: 628-644.
[34] Li J W, Huang L, Liu C P. People counting across multiple cameras for intelligent video surveillance [C]//IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, 2012: 178-183.
[35] Ma H D, Zeng C B, Ling C X. A reliable people counting system via multiple cameras [J]. ACM Transactions on Intelligent Systems and Technology, 2012: 1-22.
[36] Maddalena L, Petrosino A, Russo F. People counting by learning their appearance in a multi-view camera environment [J]. Pattern Recognition Letters, 2014, 36: 125-134.
[37] Ryan D, Denman S, Fookes C, et al. Scene invariant multi camera crowd counting [J]. Pattern Recognition Letters, 2014, 44: 98-112.
[38] Tang N C, Lin Y, Weng M F, et al. Cross-camera knowledge transfer for multiview people counting [J]. IEEE Transactions on Image Processing, 2015, 24(1): 80-93.
[39] Ge W N, Collins R T. Crowd detection with a multiview sampler [DB/OL]. 2010[2023-11-05]. https://link.springer.com/content/pdf/10.1007/978-3-642-15555-0_24.pdf.
[40] Ferryman J, Shahrokni A. PETS2009: dataset and challenge [C]//The Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, 2010: 1-6.
[41] Zhang Q, Chan A B. Wide-area crowd counting via ground-plane density maps and multiview fusion CNNs [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 8297-8306.
[42] Zhang Q, Wei L, Antoni B. 3D crowd counting via multi-view fusion with 3D Gaussian kernels [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022: 3123-3139.
[43] Sunkara R, Luo T. No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects [C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2023: 443-459.
[44] Sajjadi M S M, Vemulapalli R, Brown M. Frame-recurrent video super-resolution [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6626-6634.
[45] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[46] Cai Z W, Vasconcelos N. Cascade R-CNN: delving into high quality object detection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[47] Pang J M, Chen K, Shi J P, et al. Libra R-CNN: towards balanced learning for object detection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 821-830.
[48] Duan K W, Bai S, Xie L X, et al. CenterNet: object detection with keypoint triplets [DB/OL]. 2019[2023-11-05]. https://arxiv.org/abs/1904.08189.
[49] Li Y H, Chen Y T, Wang N Y, et al. Scale-aware trident networks for object detection [C]//IEEE/CVF International Conference on Computer Vision (ICCV), 2020: 6053-6062.
[50] Zhang S F, Chi C, Yao Y Q, et al. Bridging the gap between anchor-based and anchorfree detection via adaptive training sample selection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 9756-9765.
[51] Zhu B J, Wang J F, Jiang Z K, et al. AutoAssign: differentiable label assignment for dense object detection [DB/OL]. 2007[2023-11-05]. https://arxiv.org/abs/2007.03496.
[52] Tian Z, Shen C H, Chen H, et al. FCOS: fully convolutional one-stage object detection [C]//IEEE/CVF International Conference on Computer Vision (ICCV), 2020: 9626-9635.
[53] Zhu C C, He Y H, Savvides M. Feature selective anchor-free module for single-shot object detection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 840-849.
[54] Wang J Q, Zhang W W, Cao Y H, et al. Side-aware boundary localization for more precise object detection [C]//European Conference on Computer Vision, 2020: 403-419.
[55] Feng C J, Zhong Y J, Gao Y, et al. TOOD: task-aligned one-stage object detection [C]//IEEE/CVF International Conference on Computer Vision (ICCV), 2022: 3490-3499.
[56] Chen Q, Wang Y M, Yang T, et al. You only look one-level feature [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021: 13034-13043.