基于非线性高斯平方距离损失的目标检测

doi:10.3969/j.issn.0255-8297.2024.01.001

摘要/Abstract

摘要： 在目标检测领域中，基于交并比（intersection over union,IoU）的系列损失函数存在一定的局限性，使得边界框回归的精度和稳定性有待进一步提升。为此提出了一种基于非线性高斯平方距离的边界框回归损失函数。首先综合考虑了边界框中重叠性、中心点距离和长宽比3个因素，将边界框建模为高斯分布；然后提出一种高斯平方距离来衡量概率分布之间的差距；最后设计了符合优化趋势的非线性函数，将高斯平方距离转化为有利于神经网络学习的损失函数。实验结果表明，与IoU损失相比，所提方法在掩膜区域卷积神经网络、一阶全卷积目标检测器和自适应特征选择目标检测器上的平均精度均值分别提高了0.3%、1.1%和2.3%，证明了该方法能有效提升目标检测的性能，同时有利于高精度边界框的回归。

关键词: 目标检测, 边界框回归, 高斯分布, 交并比, 卷积神经网络

Abstract: Existing series of loss functions based on intersection over union (IoU) have certain limitations, impacting the accuracy and stability of bounding box regression in object detection. To address this problem, a bounding box regression loss based on nonlinear Gaussian squared distance is proposed. Firstly, the three factors including overlapping, center point distance and aspect ratio in the bounding box are comprehensively considered, and the bounding box is modeled as a Gaussian distribution. Then a Gaussian squared distance is proposed to measure the distance between two distributions. Finally, a nonlinear function is designed to transform the Gaussian square distance into a loss function that facilitates neural network learning. Experimental results show that compared with IoU loss, the mean average precision of the proposed method on mask region-based convolutional neural network, fully convolutional one-stage object detector and adaptive training sample selection object detector is improved by 0.3%, 1.1% and 2.3%, respectively. These results demonstrate the efficiency of the proposed method in enhancing target detection performance and supporting the regression of high-precision bounding boxes.

Key words: object detection, bounding box regression, Gaussian distribution, intersection over union (IoU), convolutional neural network

中图分类号:

TP391

李瑞, 李毅. 基于非线性高斯平方距离损失的目标检测[J]. 应用科学学报, 2024, 42(1): 1-14.

LI Rui, LI Yi. Object Detection Based on Nonlinear Gaussian Squared Distance Loss[J]. Journal of Applied Sciences, 2024, 42(1): 1-14.

参考文献

[1] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001: 990517.
[2] Lowe D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[3] Felzenszwalb P, Mcallester D, Ramanan D. A discriminatively trained, multiscale, deformable part model [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2008: 1-8.
[4] Viola P, Jones M J. Robust real-time face detection [J]. International Journal of Computer Vision, 2004, 57(2): 137-154.
[5] Dalal N, Triggs B. Histograms of oriented gradients for human detection [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005, 1: 886-893.
[6] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90.
[7] Yu J H, Jiang Y N, Wang Z Y, et al. UnitBox: an advanced object detection network [C]//The 24th ACM International Conference on Multimedia, 2016: 516-520.
[8] Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 658-666.
[9] Zheng Z H, Wang P, Liu W, et al. Distance-IoU loss: faster and better learning for bounding box regression [J]. AAAI Conference on Artificial Intelligence, 2020, 34(7): 12993-13000.
[10] He J B, Erfani S, Ma X J, et al. Alpha -IoU: a family of power intersection over union losses for bounding box regression [J]. Advances in Neural Information Processing Systems, 2021, 34: 20230-20242.
[11] He K, Gkioxari G, Dollár P, et al. Mask R-CNN [C]//IEEE International Conference on Computer Vision, 2017: 2961-2969.
[12] Tian Z, Shen C H, Chen H, et al. FCOS: fully convolutional one-stage object detection [C]//IEEE/CVF International Conference on Computer Vision, 2019: 9627-9636.
[13] Zhang S F, Chi C, Yao Y Q, et al. Bridging the gap between anchor-based and anchorfree detection via adaptive training sample selection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9759-9768.
[14] Vu T, Kang H, Yoo C D. SCNet: training inference sample consistency for instance segmentation [C]//AAAI Conference on Artificial Intelligence, 2021, 35(3): 2701-2709.
[15] Girshick R, Donahue J, Darrell T, et al. Region-based convolutional networks for accurate object detection and segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(1): 142-158.
[16] Girshick R. Fast R-CNN [C]//IEEE International Conference on Computer Vision (ICCV), 2015: 1440-1448.
[17] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[18] Cai Z W, Vasconcelos N. Cascade R-CNN: delving into high quality object detection [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[19] Chen K, Pang J, Wang J, et al. Hybrid task cascade for instance segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4974-4983.
[20] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 779-788.
[21] Redmon J, Farhadi A. YOLO9000: better, faster, stronger [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 7263-7271.
[22] Redmon J, Farhadi A. YOLOv3: an incremental improvement [DB/OL]. 2018[2023-07-05]. https://arxiv.org/abs/1804.02767.
[23] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector [M]//Computer Vision. Cham: Springer, 2016.
[24] Fu C Y, Liu W, Ranga A, et al. DSSD: deconvolutional single shot detector [DB/OL]. 2017[2023-07-05]. https://arxiv.org/abs/1701.06659.
[25] Zhou P, Ni B B, Geng C, et al. Scale-transferrable object detection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 528-537
[26] Yang Z, Liu S, Hu H, et al. RepPoints: point set representation for object detection [C]//International Conference on Computer Vision (ICCV), 2019: 9657-9666.
[27] Law H, Deng J. CornerNet: detecting objects as paired keypoints [J]. International Journal of Computer Vision, 2020, 128(3): 642-656.
[28] Zhou X Y, Wang D Q, Krähenbühl P. Objects as points [DB/OL]. 2019[2023-07-05]. https://arxiv.org/abs/1904.07850.
[29] Li C Y, Li L L, Jiang H L, et al. YOLOv6: a single-stage object detection framework for industrial applications [DB/OL]. 2022[2023-07-05]. https://arxiv.org/abs/2209.02976.
[30] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[31] Schroeter J, Tuytelaars T, Sidorov K, et al. Learning multi-instance sub-pixel point localization [C]//Asian Conference on Computer Vision. Cham: Springer, 2021: 669-686.
[32] Chen K, Wang J Q, Pang J M, et al. MMDetection: open MMLab detection toolbox and benchmark [DB/OL]. 2019[2023-07-05]. https://arxiv.org/abs/1906.07155.