基于多路激励和金字塔切分注意力的鸟类行为识别

邓抒憧, 陈爱斌, 戴子健

doi:10.3969/j.issn.0255-8297.2025.01.011

应用科学学报 >

2025 , Vol. 43 >Issue 1: 154 - 168

DOI: https://doi.org/10.3969/j.issn.0255-8297.2025.01.011

计算机应用专辑

基于多路激励和金字塔切分注意力的鸟类行为识别

展开

中南林业科技大学人工智能应用研究所, 湖南长沙 410004

收稿日期: 2024-07-10

网络出版日期: 2025-01-24

基金资助

国家自然科学基金（No.62276276）；湖南省自然科学基金（No.2024JJ5647）资助

收起

Bird Action Recognition Based on Multiple Excitation and Pyramid Split Attention

Expand

Institute of Artificial Intelligence Application, Central South University of Forestry and Technology, Changsha 410004, Hunan, China

Received date: 2024-07-10

Online published: 2025-01-24

Fold

摘要

针对传统行为识别方法在处理复杂鸟类行为模式时存在辨识率低、误判率高等问题，提出了一种基于多路激励模块和金字塔切分注意力的改进3D残差网络的深度学习模型。利用帧间差分法有效减轻计算负担，在精确保留关键时空信息的同时提高了识别精度。引入多路激励模块改进原有残差块，使模型能够精准捕捉细微运动行为特征，解决了鸟类复杂动态行为识别易混淆的问题。以3D金字塔切分注意力替换原有3D卷积层，实现对不同尺度鸟类行为特征的有效捕获。在自建鸟类行为视频数据集上进行实验，对常见鸟类行为的识别准确率达到90.48%，显著优于基准模型与其他现有流行行为识别网络，证明了所提模型对复杂鸟类行为识别的有效性。

关键词： 鸟类行为识别; 多路激励; 金字塔切分注意力; 帧间差分法; 自建数据集

本文引用格式

邓抒憧, 陈爱斌, 戴子健 . 基于多路激励和金字塔切分注意力的鸟类行为识别[J]. 应用科学学报, 2025 , 43(1) : 154 -168 . DOI: 10.3969/j.issn.0255-8297.2025.01.011

Abstract

Aiming at the problem of low recognition accuracy and high misclassification rate of traditional action recognition methods in dealing with complex bird action patterns, an enhanced deep learning model is proposed. The model integrates a multiple-excitation module and pyramid split attention to improve 3D residual networks, aiming to improve both the accuracy and efficiency of bird action recognition. The inter-frame difference method is utilized to effectively reduce the computational burden while preserving critical spatio-temporal information, thereby improving the recognition accuracy. The introduction of a multiple-excitation module improves the original residual block so that the model can accurately capture subtle motion action features, which solves ambiguities in recognizing complex dynamic actions of birds. Additionally, the original 3D convolutional layer is replaced with 3D pyramid split attention to achieve effective capture of bird action features at different scales. Experiments conducted on a self-built bird action video dataset demonstrate a high recognition accuracy of 90.48%, which significantly outperforms the baseline model and other existing popular action recognition networks. These results confirm that the model can effectively handle the complex bird action recognition task.

Key words： bird action recognition; multiple excitation; pyramid split attention; interframe difference method; self-built dataset

参考文献

[1] Alvarenga F A P, Borges I, Palkovi L, et al. Using a three-axis accelerometer to identify and classify sheep behaviour at pasture [J]. Applied Animal Behaviour Science, 2016, 181: 91-99.
[2] Bernal J, Kushibar K, Asfaw D S, et al. Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review [J]. Artificial Intelligence in Medicine, 2019, 95: 64-81.
[3] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos [C]//28th Conference on Neural Information Processing Systems, 2014: 568-576.
[4] Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks [C]//IEEE/CVF International Conference on Computer Vision, 2015: 4489-4497.
[5] Zhang K, Sun M, Han T X, et al. Residual networks of residual networks: multilevel residual networks [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(6): 1303-1314.
[6] Carreira J, Zisserman A, Quo V. Action recognition? A new model and the kinetics dataset [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 4724-4733.
[7] Feichtenhofer C, Fan H Q, Malik J, et al. SlowFast networks for video recognition [C]// IEEE/CVF International Conference on Computer Vision, 2019: 6202-6211.
[8] Lin J, Gan C, Han S. TSM: temporal shift module for efficient video understanding [C]// IEEE/CVF International Conference on Computer Vision, 2019: 7083-7093.
[9] Kalfaoglu M E, Kalkan S, Alatan A A. Late temporal modeling in 3D CNN architectures with BERT for action recognition [C]//Computer Vision-ECCV 2020 Workshops, 2020: 731-747.
[10] Fuentes A, Yoon S, Park J, et al. Deep learning-based hierarchical cattle behavior recognition with spatio-temporal information [J]. Computers and Electronics in Agriculture, 2020, 177: 105627.
[11] Nasirahmadi A, Sturm B, Edwards S, et al. Deep learning and machine vision approaches for posture detection of individual pigs [J]. Sensors, 2019, 19(17): 3738.
[12] Feng L, Zhao Y, Sun Y, et al. Action recognition using a spatial-temporal network for wild felines [J]. Animals, 2021, 11(2): 485.
[13] Schindler F, Steinhage V. Identification of animals and recognition of their actions in wildlife videos using deep learning techniques [J]. Ecological Informatics, 2021, 61: 101215.
[14] Tran D, Wang H, Torresani L, et al. A closer look at spatiotemporal convolutions for action recognition [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6450-6459.
[15] 王春清, 王悦涛, 尚书旗, 等. 基于YOLOv5x的鸡只基本行为识别方法研究[J]. 农业装备与车辆工程, 2024, 62(4): 1-5. Wang C Q, Wang Y T, Shang S Q, et al. Research on chicken basic behavior recognition method based on YOLOv5x [J]. Agricultural Equipment & Vehicle Engineering, 2024, 62(4): 1-5. (in Chinese)
[16] 袁洪波, 曹润柳, 程曼. 融合Res3D、 BiLSTM和注意力机制的羊只行为识别方法[J]. 农业机械学报, 2024, 55(4): 221-230. Yuan H B, Cao R L, Cheng M. Fusion of Res3D, BiLSTM and attention mechanism for sheep behavior recognition method [J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55(4): 221-230. (in Chinese)
[17] 杜妍茹. 基于计算机视觉的牛日常行为识别研究[D]. 包头: 内蒙古科技大学, 2023.
[18] Du Y H, Zhao Z C, Song Y, et al. StrongSORT: make DeepSORT great again [J]. IEEE Transactions on Multimedia, 2023, 25: 8725-8737.
[19] Li C. Dangerous posture monitoring for undersea diver based on frame difference method [J]. Journal of Coastal Research, 2020, 103(S1): 939-942.
[20] Wang Z, She Q, Smolic A. Action-net: multipath excitation for action recognition [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13214-13223.
[21] Zhang H, Zu K K, Lu J, et al. EPSANet: an efficient pyramid squeeze attention block on convolutional neural network [C]//Asian Conference on Computer Vision, 2022: 1161-1177.
[22] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[23] Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13713-13722.
[24] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module [C]//European Conference on Computer Vision, 2018: 3-19.
[25] Huang Z L, Wang X G, Huang L C, et al. CCNET: criss-cross attention for semantic segmentation [C]//IEEE/CVF International Conference on Computer Vision, 2019: 603-612.
[26] Wang X L, Girshick R, Gupta A, et al. Non-local neural networks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7794-7803.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献