Conventional deep learning models exhibit limited recognition performance in infrared images, primarily because the lack of discriminative features makes them difficult to effectively distinguish similar behaviors. To solve this problem, a novel infrared image-oriented human action recognition method based on median-guided multi-scale feature fusion was proposed. First, an information modeling mechanism that integrated median-enhanced attention and multi-scale feature comparison was constructed. This mechanism finely modeled the differences between feature hierarchies, guiding the network to focus on the key feature regions that distinguished different action categories, therefore breaking through the limitation of traditional methods that relied on global features for classification. Second, a median-enhanced spatial and channel attention module was designed, which solved the problem that traditional Siamese networks were difficult to accurately focus on the key regions of the human body in infrared action images due to the lack of explicit positional information in deep features. Finally, a multi-scale feature fusion module was proposed, which could effectively fuse multi-scale features, enhance the expression ability of action details and structural information in infrared images, strengthen the model’s ability to capture subtle action changes, and reduce the misjudgment rate caused by information loss and background interference. Experimental results show that the recognition accuracy of the proposed method is superior to that of existing mainstream methods in multiple datasets such as infrared splicing, PUB, and VAIS, which fully demonstrates the effectiveness and advancement of this method.
[1] 张晓龙, 王庆伟, 李尚滨. 基于强化学习的多模态场景人体危险行为识别方法[J]. 应用科学学报, 2021, 39(4): 605-614. Zhang X L, Wang Q W, Li S B. Recognition method of human dangerous behavior in multimodal scenes using reinforcement learning [J]. Journal of Applied Sciences, 2021, 39(4): 605-614. (in Chinese)
[2] 刘硕, 瞿崇晓, 祝中科, 等. 基于MSR和AMSR的红外融合增强算法[J]. 应用科学学报, 2022, 40(3): 423-433. Liu S, Qu C X, Zhu Z K, et al. Infrared image fusion enhancement algorithm based on MSR and AMSR [J]. Journal of Applied Sciences, 2022, 40(3): 423-433. (in Chinese)
[3] 杨亚东, 黄胜一, 谭毅华. 基于低秩和重加权稀疏表示的红外弱小目标检测算法[J]. 应用科学学报, 2023, 41(5): 753-765. Yang Y D, Huang S Y, Tan Y H. Infrared dim and small target detection algorithm based on low-rank and reweighted sparse representation [J]. Journal of Applied Sciences, 2023, 41(5): 753-765. (in Chinese)
[4] 金安安, 李祥, 张丽, 等. 基于NSCT与压缩感知的红外影像融合[J]. 应用科学学报, 2022, 40(1): 80-92. Jin A A, Li X, Zhang L, et al. Infrared image fusion based on NSCT and compressed sensing [J]. Journal of Applied Sciences, 2022, 40(1): 80-92. (in Chinese)
[5] 张晶晶, 曹思华, 崔文楠, 等. 基于改进顶帽变换的红外弱小目标检测[J]. 电子与信息学报, 2024, 46(1): 267-276. Zhang J J, Cao S H, Cui W N, et al. Improved top-hat transform–based algorithm for infrared dim and small target detection [J]. Journal of Electronics & Information Technology, 2024, 46(1): 267-276. (in Chinese)
[6] 邵振峰, 蔡家骏, 王中元, 等. 面向智能监控摄像头的监控视频大数据分析处理[J]. 电子与信息学报, 2017, 39(5): 1116-1122. Shao Z F, Cai J J, Wang Z Y, et al. Analytical processing method of big surveillance video data based on smart monitoring cameras [J]. Journal of Electronics & Information Technology, 2017, 39(5): 1116-1122. (in Chinese)
[7] Pang Z X, Liu G H, Li G S, et al. An infrared image enhancement method via content and detail two-stream deep convolutional neural network [J]. Infrared Physics & Technology, 2023, 132: 104761.
[8] Feng Z Q, Wang X G, Zhou J Y, et al. MDJ: a multi-scale difference joint keyframe extraction algorithm for infrared surveillance video action recognition [J]. Digital Signal Processing, 2024, 148: 104469.
[9] Li J J, Gong R Y, Wang G. Enhancing fitness action recognition with ResNet-TransFit: integrating IoT and deep learning techniques for real-time monitoring [J]. Alexandria Engineering Journal, 2024, 109: 89-101.
[10] Li Y, Wu C Y, Feichtenhofer C, et al. Improved multiscale vision transformers for classification and detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12294-12305.
[11] Wang L M, Xiong Y J, Wang Z, et al. Temporal segment networks for action recognition in videos [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(11): 2740-2755.
[12] Li K, Wang L, Wang X, et al. UniFormerV2: spatiotemporal learning by arming image VITs with video uniformer [DB/OL]. (2022-11-17) [2024-06-05]. https://arxiv.org/abs/2211.09552.
[13] Zhou B L, Andonian A, Oliva A, et al. Temporal relational reasoning in videos [C]//Computer Vision-ECCV 2018, 2018: 831-846.
[14] Chen S L, Wang X W, Sun Y F, et al. STAN: spatio-temporal analysis network for efficient video action recognition [J]. Expert Systems with Applications, 2025, 268: 126255.
[15] Mazari A, Sahbi H. Deep multiple aggregation networks for action recognition [J]. International Journal of Multimedia Information Retrieval, 2024, 13(1): 9-36.
[16] Lee E J, Ko B C, Nam J Y. Recognizing pedestrian’s unsafe behaviors in far-infrared imagery at night [J]. Infrared Physics & Technology, 2016, 76: 261-270.
[17] Tian Q H, Miao W L, Zhang L Z, et al. STCA: an action recognition network with spatiotemporal convolution and attention [J]. International Journal of Multimedia Information Retrieval, 2024, 14(1): 1-12.
[18] Zhao Q, Su Y X, Zhang H. Stme-net: spatio-temporal motion excitation network for action recognition [J]. Journal of Real-Time Image Processing, 2025, 22(2): 88-101.
[19] Chopra S, Hadsell R, Lecun Y. Learning a similarity metric discriminatively, with application to face verification [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 539-546.
[20] 周啸辉, 余磊, 何茜, 等. 基于改进ResNet-18的红外图像人体行为识别方法研究[J]. 激光与红外, 2021, 51(9): 1178-1184. Zhou X H, Yu L, He X, et al. Research on human behavior recognition method in infrared image based on improved ResNet 18[J]. Laser & Infrared, 2021, 51(9): 1178-1184. (in Chinese)
[21] Zhang M M, Choi J, Daniilidis K, et al. VAIS: a dataset for recognizing maritime imagery in the visible and infrared spectrums [C]//IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015: 10-16.