应用科学学报 ›› 2025, Vol. 43 ›› Issue (2): 234-244.doi: 10.3969/j.issn.0255-8297.2025.02.004

• 通信工程 • 上一篇    

基于多尺度特征融合和注意力机制的视频异常检测方法

吴祥, 肖剑, 吉根林   

  1. 南京师范大学 计算机与电子信息学院/人工智能学院, 江苏 南京 210023
  • 收稿日期:2023-07-19 发布日期:2025-04-03
  • 通信作者: 吉根林,教授,博导,研究方向为大数据分析与挖掘。E-mail:glji@njnu.edu.cn
  • 基金资助:
    国家自然科学基金(No.41471371)资助

Video Anomaly Detection Method Based on Multi-scale Feature Fusion and Attention Mechanism

WU Xiang, XIAO Jian, JI Genlin   

  1. School of Computer and Electronic Information/School of Artificial Intelligence, Nanjing Normal University, Nanjing 210023, Jiangsu, China
  • Received:2023-07-19 Published:2025-04-03

摘要: 视频画面中的运动物体在不同时刻往往呈现出多样的尺度大小,这给视频异常检测带来了一定的挑战。尽管传统的生成对抗网络在视频异常检测任务上取得了一定成效,但因其采用单一尺度的特征提取方法,无法充分捕获不同尺度物体的特征,从而限制了其异常检测的性能。针对该问题,本文基于生成对抗网络结构,提出了一种基于多尺度特征融合和注意力机制的视频异常检测方法。使用大小不同的卷积核捕获不同感受野的特征,并将它们进行融合以获得多尺度的特征表示。此外,在生成器的转置卷积层后引入坐标注意力机制,自适应分配特征图权重,从而增强模型对关键特征的感知能力。在公开数据集UCSD Ped2和Avenue上的实验结果表明,本文方法的性能优于其他同类方法。

关键词: 视频异常检测, 深度学习, 生成对抗网络, 多尺度特征融合, 注意力机制

Abstract: Motion objects in video frames often exhibit diverse scales over time, which poses a challenge for video anomaly detection. Although traditional generative adversarial networks (GANs) have achieved some success in video anomaly detection tasks, their performance is limited due to the use of a single-scale feature extraction that fails to capture features of objects at different scales. To address this issue, this paper proposes a video anomaly detection method based on a GAN structure that incorporates multi-scale feature fusion and attention mechanisms. Specifically, different-sized convolutional kernels are employed to capture features with varying receptive fields, which are then fused to obtain multi-scale feature representations. Additionally, a coordinate attention mechanism is introduced after the transposed convolutional layers of the generator, allowing adaptive allocation of feature map weights to enhance the model’s perception of crucial features.Experimental results on the public datasets UCSD Ped2 and Avenue demonstrate that the proposed method outperforms existing approaches.

Key words: video anomaly detection, deep learning, generative adversarial networks, multi-scale feature fusion, attention mechanism

中图分类号: