Special Issue on Computer Application

Bird Action Recognition Based on Multiple Excitation and Pyramid Split Attention

Expand
  • Institute of Artificial Intelligence Application, Central South University of Forestry and Technology, Changsha 410004, Hunan, China

Received date: 2024-07-10

  Online published: 2025-01-24

Abstract

Aiming at the problem of low recognition accuracy and high misclassification rate of traditional action recognition methods in dealing with complex bird action patterns, an enhanced deep learning model is proposed. The model integrates a multiple-excitation module and pyramid split attention to improve 3D residual networks, aiming to improve both the accuracy and efficiency of bird action recognition. The inter-frame difference method is utilized to effectively reduce the computational burden while preserving critical spatio-temporal information, thereby improving the recognition accuracy. The introduction of a multiple-excitation module improves the original residual block so that the model can accurately capture subtle motion action features, which solves ambiguities in recognizing complex dynamic actions of birds. Additionally, the original 3D convolutional layer is replaced with 3D pyramid split attention to achieve effective capture of bird action features at different scales. Experiments conducted on a self-built bird action video dataset demonstrate a high recognition accuracy of 90.48%, which significantly outperforms the baseline model and other existing popular action recognition networks. These results confirm that the model can effectively handle the complex bird action recognition task.

Cite this article

DENG Shuchong, CHEN Aibin, DAI Zijian . Bird Action Recognition Based on Multiple Excitation and Pyramid Split Attention[J]. Journal of Applied Sciences, 2025 , 43(1) : 154 -168 . DOI: 10.3969/j.issn.0255-8297.2025.01.011

References

[1] Alvarenga F A P, Borges I, Palkovi L, et al. Using a three-axis accelerometer to identify and classify sheep behaviour at pasture [J]. Applied Animal Behaviour Science, 2016, 181: 91-99.
[2] Bernal J, Kushibar K, Asfaw D S, et al. Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review [J]. Artificial Intelligence in Medicine, 2019, 95: 64-81.
[3] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos [C]//28th Conference on Neural Information Processing Systems, 2014: 568-576.
[4] Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks [C]//IEEE/CVF International Conference on Computer Vision, 2015: 4489-4497.
[5] Zhang K, Sun M, Han T X, et al. Residual networks of residual networks: multilevel residual networks [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(6): 1303-1314.
[6] Carreira J, Zisserman A, Quo V. Action recognition? A new model and the kinetics dataset [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 4724-4733.
[7] Feichtenhofer C, Fan H Q, Malik J, et al. SlowFast networks for video recognition [C]// IEEE/CVF International Conference on Computer Vision, 2019: 6202-6211.
[8] Lin J, Gan C, Han S. TSM: temporal shift module for efficient video understanding [C]// IEEE/CVF International Conference on Computer Vision, 2019: 7083-7093.
[9] Kalfaoglu M E, Kalkan S, Alatan A A. Late temporal modeling in 3D CNN architectures with BERT for action recognition [C]//Computer Vision-ECCV 2020 Workshops, 2020: 731-747.
[10] Fuentes A, Yoon S, Park J, et al. Deep learning-based hierarchical cattle behavior recognition with spatio-temporal information [J]. Computers and Electronics in Agriculture, 2020, 177: 105627.
[11] Nasirahmadi A, Sturm B, Edwards S, et al. Deep learning and machine vision approaches for posture detection of individual pigs [J]. Sensors, 2019, 19(17): 3738.
[12] Feng L, Zhao Y, Sun Y, et al. Action recognition using a spatial-temporal network for wild felines [J]. Animals, 2021, 11(2): 485.
[13] Schindler F, Steinhage V. Identification of animals and recognition of their actions in wildlife videos using deep learning techniques [J]. Ecological Informatics, 2021, 61: 101215.
[14] Tran D, Wang H, Torresani L, et al. A closer look at spatiotemporal convolutions for action recognition [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6450-6459.
[15] 王春清, 王悦涛, 尚书旗, 等. 基于YOLOv5x的鸡只基本行为识别方法研究[J]. 农业装备与车辆工程, 2024, 62(4): 1-5. Wang C Q, Wang Y T, Shang S Q, et al. Research on chicken basic behavior recognition method based on YOLOv5x [J]. Agricultural Equipment & Vehicle Engineering, 2024, 62(4): 1-5. (in Chinese)
[16] 袁洪波, 曹润柳, 程曼. 融合Res3D、 BiLSTM和注意力机制的羊只行为识别方法[J]. 农业机械学报, 2024, 55(4): 221-230. Yuan H B, Cao R L, Cheng M. Fusion of Res3D, BiLSTM and attention mechanism for sheep behavior recognition method [J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55(4): 221-230. (in Chinese)
[17] 杜妍茹. 基于计算机视觉的牛日常行为识别研究[D]. 包头: 内蒙古科技大学, 2023.
[18] Du Y H, Zhao Z C, Song Y, et al. StrongSORT: make DeepSORT great again [J]. IEEE Transactions on Multimedia, 2023, 25: 8725-8737.
[19] Li C. Dangerous posture monitoring for undersea diver based on frame difference method [J]. Journal of Coastal Research, 2020, 103(S1): 939-942.
[20] Wang Z, She Q, Smolic A. Action-net: multipath excitation for action recognition [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13214-13223.
[21] Zhang H, Zu K K, Lu J, et al. EPSANet: an efficient pyramid squeeze attention block on convolutional neural network [C]//Asian Conference on Computer Vision, 2022: 1161-1177.
[22] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[23] Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13713-13722.
[24] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module [C]//European Conference on Computer Vision, 2018: 3-19.
[25] Huang Z L, Wang X G, Huang L C, et al. CCNET: criss-cross attention for semantic segmentation [C]//IEEE/CVF International Conference on Computer Vision, 2019: 603-612.
[26] Wang X L, Girshick R, Gupta A, et al. Non-local neural networks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7794-7803.
Outlines

/