应用科学学报 ›› 2023, Vol. 41 ›› Issue (5): 777-788.doi: 10.3969/j.issn.0255-8297.2023.05.005

• 信号与信息处理 • 上一篇    

结合transformer多尺度实例交互的稀疏集目标检测

阚亚亚, 张孙杰, 熊娟, 祖奕   

  1. 上海理工大学 光电信息与计算机工程学院, 上海 200093
  • 收稿日期:2022-04-21 发布日期:2023-09-28
  • 通信作者: 张孙杰,副教授,研究方向为智能图像处理、模糊控制与滤波。E-mail:zhang_sunjie@126.com E-mail:zhang_sunjie@126.com
  • 基金资助:
    上海市晨光学者基金(No.18CG52)资助

Sparse Set Object Detection Combined with Transformer Multi-scale Instance Interaction

KAN Yaya, ZHANG Sunjie, XIONG Juan, ZU Yi   

  1. School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Received:2022-04-21 Published:2023-09-28

摘要: 为改进稀疏集目标检测方法存在的特征图缺乏空间细节信息、目标特征没有做到全局上下文实例交互、全局语义信息没有得到充分学习等问题,设计了一种结合自适应特征增强和实例特征交互的稀疏集目标检测算法。自适应特征增强模块在特征提取过程中利用不同尺度的池化和卷积来丰富高级语义信息,减小低级语义信息背景噪声的干扰,降低目标错检率和漏检率。实例特征交互模块在边界框回归设计中结合transformer的多层注意力,并融合通道注意力和动态卷积网络对建议框的通道信息进行增强,改善了目标的边缘信息,提高了网络的实例特征交互效率。最后在COCO2017数据集与原始网络进行实验对比,平均精度提升了4.2%,其中在大目标上提升了4.6%,在PASCAL VOC数据集上提升了2.7%。

关键词: 稀疏集目标检测, 多尺度特征, 实例特征交互, transformer

Abstract: In order to improve the problem of lack of spatial detail information in feature maps, failure of target features to interact with global context instance, and insufficient learning of global semantic information, a sparse set object detection algorithm combining adaptive feature augmentation and instance feature interaction is designed. In the process of feature extraction, the adaptive feature augmentation module uses pooling and convolution at different scales to enrich high-level semantic information, and reduces noise interference such as the low-level semantic information background. Meanwhile, it decreases the rate of false detection and missed detection. In design of bounding box regression, the instance feature interaction module combines multi-layer attention of transformer which enhances the channel information of the proposal box. Channel attention and dynamic convolution network are also employed to improve the edge information of the object and increase the interaction efficiency of the network instance feature. Finally, experiment results show that the average accuracy of COCO2017 dataset is improved by 4.2%, 4.6% on the large target, and 2.7% on PASCAL VOC dataset, respectively.

Key words: sparse set object detection, multi-scale feature, instance feature interaction, transformer

中图分类号: