应用科学学报 ›› 2026, Vol. 44 ›› Issue (2): 266-281.doi: 10.3969/j.issn.0255-8297.2026.02.007

• 智能信息处理 • 上一篇    下一篇

基于中值引导多尺度特征融合的红外图像人体行为识别

袁帅, 余磊, 姚天, 熊邦书   

  1. 南昌航空大学 图像处理与模式识别江西省重点实验室, 江西 南昌 330063
  • 收稿日期:2024-06-05 发布日期:2026-04-07
  • 通信作者: 余磊,教授,研究方向为图像处理、计算机视觉。E-mail:yulei@nchu.edu.cn E-mail:yulei@nchu.edu.cn
  • 基金资助:
    国家自然科学基金(No.62162044);江西省重点研发计划(No.20212BBE53017)

Human Action Recognition in Infrared Images Based on Median-Guided Multi-scale Feature Fusion

YUAN Shuai, YU Lei, YAO Tian, XIONG Bangshu   

  1. Jiangxi Provincial Key Laboratory of Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063, Jiangxi, China
  • Received:2024-06-05 Published:2026-04-07

摘要: 传统深度学习模型在红外图像中由于缺乏判别性特征,难以有效区分相似行为,导致识别性能受限。为解决此问题,本文提出了一种基于中值引导多尺度特征融合的红外人体行为识别方法。首先,构建了一种融合中值增强注意力与多尺度特征对比的信息建模机制,该机制通过精细建模特征层级间的差异,引导网络聚焦于区分不同动作类别的关键特征区域,突破了传统方法依赖全局特征分类的局限。其次,设计了中值增强空间通道注意力模块,解决了传统孪生网络在红外行为图像中因深层特征缺乏显式位置信息而难以准确聚焦人体关键区域的问题。最后,提出了多尺度特征融合模块,有效融合多尺度特征,提升红外图像中行为细节与结构信息的表达能力,增强对微小动作变化的捕捉能力,降低因信息缺失和背景干扰导致的误判率。实验结果表明,本文所提方法的识别精度在红外拼接、PUB、VAIS等多个数据集中均优于现有主流方法,充分体现了该方法的有效性与先进性。

关键词: 人体行为识别, 红外图像, 中值增强空间通道注意力, 多尺度特征融合

Abstract: Conventional deep learning models exhibit limited recognition performance in infrared images, primarily because the lack of discriminative features makes them difficult to effectively distinguish similar behaviors. To solve this problem, a novel infrared image-oriented human action recognition method based on median-guided multi-scale feature fusion was proposed. First, an information modeling mechanism that integrated median-enhanced attention and multi-scale feature comparison was constructed. This mechanism finely modeled the differences between feature hierarchies, guiding the network to focus on the key feature regions that distinguished different action categories, therefore breaking through the limitation of traditional methods that relied on global features for classification. Second, a median-enhanced spatial and channel attention module was designed, which solved the problem that traditional Siamese networks were difficult to accurately focus on the key regions of the human body in infrared action images due to the lack of explicit positional information in deep features. Finally, a multi-scale feature fusion module was proposed, which could effectively fuse multi-scale features, enhance the expression ability of action details and structural information in infrared images, strengthen the model’s ability to capture subtle action changes, and reduce the misjudgment rate caused by information loss and background interference. Experimental results show that the recognition accuracy of the proposed method is superior to that of existing mainstream methods in multiple datasets such as infrared splicing, PUB, and VAIS, which fully demonstrates the effectiveness and advancement of this method.

Key words: human action recognition, infrared image, median-enhanced spatial and channel attention, multi-scale feature fusion

中图分类号: