Existing series of loss functions based on intersection over union (IoU) have certain limitations, impacting the accuracy and stability of bounding box regression in object detection. To address this problem, a bounding box regression loss based on nonlinear Gaussian squared distance is proposed. Firstly, the three factors including overlapping, center point distance and aspect ratio in the bounding box are comprehensively considered, and the bounding box is modeled as a Gaussian distribution. Then a Gaussian squared distance is proposed to measure the distance between two distributions. Finally, a nonlinear function is designed to transform the Gaussian square distance into a loss function that facilitates neural network learning. Experimental results show that compared with IoU loss, the mean average precision of the proposed method on mask region-based convolutional neural network, fully convolutional one-stage object detector and adaptive training sample selection object detector is improved by 0.3%, 1.1% and 2.3%, respectively. These results demonstrate the efficiency of the proposed method in enhancing target detection performance and supporting the regression of high-precision bounding boxes.
[1] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001: 990517.
[2] Lowe D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[3] Felzenszwalb P, Mcallester D, Ramanan D. A discriminatively trained, multiscale, deformable part model [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2008: 1-8.
[4] Viola P, Jones M J. Robust real-time face detection [J]. International Journal of Computer Vision, 2004, 57(2): 137-154.
[5] Dalal N, Triggs B. Histograms of oriented gradients for human detection [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005, 1: 886-893.
[6] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90.
[7] Yu J H, Jiang Y N, Wang Z Y, et al. UnitBox: an advanced object detection network [C]//The 24th ACM International Conference on Multimedia, 2016: 516-520.
[8] Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 658-666.
[9] Zheng Z H, Wang P, Liu W, et al. Distance-IoU loss: faster and better learning for bounding box regression [J]. AAAI Conference on Artificial Intelligence, 2020, 34(7): 12993-13000.
[10] He J B, Erfani S, Ma X J, et al. Alpha -IoU: a family of power intersection over union losses for bounding box regression [J]. Advances in Neural Information Processing Systems, 2021, 34: 20230-20242.
[11] He K, Gkioxari G, Dollár P, et al. Mask R-CNN [C]//IEEE International Conference on Computer Vision, 2017: 2961-2969.
[12] Tian Z, Shen C H, Chen H, et al. FCOS: fully convolutional one-stage object detection [C]//IEEE/CVF International Conference on Computer Vision, 2019: 9627-9636.
[13] Zhang S F, Chi C, Yao Y Q, et al. Bridging the gap between anchor-based and anchorfree detection via adaptive training sample selection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9759-9768.
[14] Vu T, Kang H, Yoo C D. SCNet: training inference sample consistency for instance segmentation [C]//AAAI Conference on Artificial Intelligence, 2021, 35(3): 2701-2709.
[15] Girshick R, Donahue J, Darrell T, et al. Region-based convolutional networks for accurate object detection and segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(1): 142-158.
[16] Girshick R. Fast R-CNN [C]//IEEE International Conference on Computer Vision (ICCV), 2015: 1440-1448.
[17] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[18] Cai Z W, Vasconcelos N. Cascade R-CNN: delving into high quality object detection [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[19] Chen K, Pang J, Wang J, et al. Hybrid task cascade for instance segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4974-4983.
[20] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 779-788.
[21] Redmon J, Farhadi A. YOLO9000: better, faster, stronger [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 7263-7271.
[22] Redmon J, Farhadi A. YOLOv3: an incremental improvement [DB/OL]. 2018[2023-07-05]. https://arxiv.org/abs/1804.02767.
[23] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector [M]//Computer Vision. Cham: Springer, 2016.
[24] Fu C Y, Liu W, Ranga A, et al. DSSD: deconvolutional single shot detector [DB/OL]. 2017[2023-07-05]. https://arxiv.org/abs/1701.06659.
[25] Zhou P, Ni B B, Geng C, et al. Scale-transferrable object detection [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 528-537
[26] Yang Z, Liu S, Hu H, et al. RepPoints: point set representation for object detection [C]//International Conference on Computer Vision (ICCV), 2019: 9657-9666.
[27] Law H, Deng J. CornerNet: detecting objects as paired keypoints [J]. International Journal of Computer Vision, 2020, 128(3): 642-656.
[28] Zhou X Y, Wang D Q, Krähenbühl P. Objects as points [DB/OL]. 2019[2023-07-05]. https://arxiv.org/abs/1904.07850.
[29] Li C Y, Li L L, Jiang H L, et al. YOLOv6: a single-stage object detection framework for industrial applications [DB/OL]. 2022[2023-07-05]. https://arxiv.org/abs/2209.02976.
[30] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[31] Schroeter J, Tuytelaars T, Sidorov K, et al. Learning multi-instance sub-pixel point localization [C]//Asian Conference on Computer Vision. Cham: Springer, 2021: 669-686.
[32] Chen K, Wang J Q, Pang J M, et al. MMDetection: open MMLab detection toolbox and benchmark [DB/OL]. 2019[2023-07-05]. https://arxiv.org/abs/1906.07155.