Journal of Applied Sciences ›› 2026, Vol. 44 ›› Issue (2): 266-281.doi: 10.3969/j.issn.0255-8297.2026.02.007

• Intelligent Information Processing • Previous Articles     Next Articles

Human Action Recognition in Infrared Images Based on Median-Guided Multi-scale Feature Fusion

YUAN Shuai, YU Lei, YAO Tian, XIONG Bangshu   

  1. Jiangxi Provincial Key Laboratory of Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063, Jiangxi, China
  • Received:2024-06-05 Published:2026-04-07

Abstract: Conventional deep learning models exhibit limited recognition performance in infrared images, primarily because the lack of discriminative features makes them difficult to effectively distinguish similar behaviors. To solve this problem, a novel infrared image-oriented human action recognition method based on median-guided multi-scale feature fusion was proposed. First, an information modeling mechanism that integrated median-enhanced attention and multi-scale feature comparison was constructed. This mechanism finely modeled the differences between feature hierarchies, guiding the network to focus on the key feature regions that distinguished different action categories, therefore breaking through the limitation of traditional methods that relied on global features for classification. Second, a median-enhanced spatial and channel attention module was designed, which solved the problem that traditional Siamese networks were difficult to accurately focus on the key regions of the human body in infrared action images due to the lack of explicit positional information in deep features. Finally, a multi-scale feature fusion module was proposed, which could effectively fuse multi-scale features, enhance the expression ability of action details and structural information in infrared images, strengthen the model’s ability to capture subtle action changes, and reduce the misjudgment rate caused by information loss and background interference. Experimental results show that the recognition accuracy of the proposed method is superior to that of existing mainstream methods in multiple datasets such as infrared splicing, PUB, and VAIS, which fully demonstrates the effectiveness and advancement of this method.

Key words: human action recognition, infrared image, median-enhanced spatial and channel attention, multi-scale feature fusion

CLC Number: