Signal and Information Processing

Sparse Set Object Detection Combined with Transformer Multi-scale Instance Interaction

Expand
  • School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

Received date: 2022-04-21

  Online published: 2023-09-28

Abstract

In order to improve the problem of lack of spatial detail information in feature maps, failure of target features to interact with global context instance, and insufficient learning of global semantic information, a sparse set object detection algorithm combining adaptive feature augmentation and instance feature interaction is designed. In the process of feature extraction, the adaptive feature augmentation module uses pooling and convolution at different scales to enrich high-level semantic information, and reduces noise interference such as the low-level semantic information background. Meanwhile, it decreases the rate of false detection and missed detection. In design of bounding box regression, the instance feature interaction module combines multi-layer attention of transformer which enhances the channel information of the proposal box. Channel attention and dynamic convolution network are also employed to improve the edge information of the object and increase the interaction efficiency of the network instance feature. Finally, experiment results show that the average accuracy of COCO2017 dataset is improved by 4.2%, 4.6% on the large target, and 2.7% on PASCAL VOC dataset, respectively.

Cite this article

KAN Yaya, ZHANG Sunjie, XIONG Juan, ZU Yi . Sparse Set Object Detection Combined with Transformer Multi-scale Instance Interaction[J]. Journal of Applied Sciences, 2023 , 41(5) : 777 -788 . DOI: 10.3969/j.issn.0255-8297.2023.05.005

References

[1] 南晓虎, 丁雷. 深度学习的典型目标检测算法综述[J]. 计算机应用研究, 2020, 37(增刊2):15-21. Nan X H, Ding L. A review typical detection algorithms for deep learning[J]. Application Research of Computers, 2020, 37(Suppl.2):15-21(in Chinese)
[2] 罗会兰, 陈鸿坤. 基于深度学习的目标检测研究综述[J]. 电子学报, 2020, 48(6):1230-1239. Luo H L, Chen H K. An overview of object detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48(6):1230-1239. (in Chinese)
[3] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015:1440-1448.
[4] Ren S, He K, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.
[5] He K, Gkioxari G, Dollar P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017:2961-2969.
[6] Liu W, Anguelov D, Erhan D, et al. SSD:single shot multibox detector[C]//European Conference on Computer Vision. Cham:Springer, 2016:21-37.
[7] Redmon J, Divvala S, Girshick R, et al. You only look once:unified real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:779-788.
[8] Redmon J, Farhadi A. YOLO9000:better, faster, stronger[C]//IEEE Conference on Computer Vision & Pattern Recognition. IEEE, 2017:6517-6525.
[9] 鞠默然, 罗江宁, 王仲博, 等. 融合注意力机制的多尺度目标检测算法[J]. 光学学报, 2020, 40(13):126-134. Ju M R, Luo J N, Wang Z B, et al. Multi-scale object detection based on attention mechanism[J]. Acta Optica Sinica, 2020, 40(13):126-134. (in Chinese)
[10] Wang K, Liew J H, Zou Y, et al. Panet:few-shot image semantic segmentation with prototype alignment[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019:9197-9206.
[11] Wang Y, Cui C, Zhou X, et al. ZigzagNet:efficient deep learning for real object recognition based on 3D models[C]//Asian Conference on Computer Vision. Cham:Springer, 2016:456-471.
[12] Peng H, Xue C, Shao Y, et al. Semantic segmentation of litchi branches using deep LabV3+ model[J]. IEEE Access, 2020, 8:164546-164555.
[13] Guo C, Fan B, Zhang Q, et al. AugFPN:improving multi-scale feature learning for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020:12595-12604.
[14] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017:2881-2890.
[15] Ronneberger O, Fischer P, Brox T. U-net:convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and ComputerAssisted Intervention. Cham:Springer, 2015:234-241.
[16] Parmar N, Vaswani A, Uszkoreit J, et al. Image transformer[C]//International Conference on Machine Learning, 2018:4055-4064.
[17] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham:Springer, 2020:213-229.
[18] Zhu X, Su W, Lu L, et al. Deformable DETR:deformable transformers for end-to-end object detection[C]//International Conference on Learning Representations, 2020:234-246.
[19] Sun P, Zhang R, Jiang Y, et al. Sparse R-CNN:end-to-end object detection with learnable proposals[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021:14454-14463.
[20] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:7132-7141.
[21] Chen Y, Dai X, Liu M, et al. Dynamic convolution:attention over convolution kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020:11030-11039
[22] Lin T Y, Maire M, Belongie S, et al. Microsoft coco:common objects in context[C]//European Conference on Computer Vision. Cham:Springer, 2014:740-755.
[23] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017:2980-2988.
[24] Tian Z, Shen C, Chen H, et al. FCOS:fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019:9627-9636.
[25] Pang J, Chen K, Shi J, et al. Libra R-CNN:towards balanced learning for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019:821-830.
[26] Zhang S, Chi C, Yao Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020:9759-9768.
[27] Dai X, Chen Y, Xiao B, et al. Dynamic head:unifying object detection heads with attentions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021:7373-7382.
Outlines

/