应用科学学报 ›› 2025, Vol. 43 ›› Issue (1): 154-168.doi: 10.3969/j.issn.0255-8297.2025.01.011

• 计算机应用专辑 • 上一篇    下一篇

基于多路激励和金字塔切分注意力的鸟类行为识别

邓抒憧, 陈爱斌, 戴子健   

  1. 中南林业科技大学 人工智能应用研究所, 湖南 长沙 410004
  • 收稿日期:2024-07-10 出版日期:2025-01-30 发布日期:2025-01-24
  • 通信作者: 陈爱斌,教授,研究方向为深度学习、音频处理、生态人工智能应用。E-mail:hotaibin@163.com E-mail:hotaibin@163.com
  • 基金资助:
    国家自然科学基金(No.62276276);湖南省自然科学基金(No.2024JJ5647)资助

Bird Action Recognition Based on Multiple Excitation and Pyramid Split Attention

DENG Shuchong, CHEN Aibin, DAI Zijian   

  1. Institute of Artificial Intelligence Application, Central South University of Forestry and Technology, Changsha 410004, Hunan, China
  • Received:2024-07-10 Online:2025-01-30 Published:2025-01-24

摘要: 针对传统行为识别方法在处理复杂鸟类行为模式时存在辨识率低、误判率高等问题,提出了一种基于多路激励模块和金字塔切分注意力的改进3D残差网络的深度学习模型。利用帧间差分法有效减轻计算负担,在精确保留关键时空信息的同时提高了识别精度。引入多路激励模块改进原有残差块,使模型能够精准捕捉细微运动行为特征,解决了鸟类复杂动态行为识别易混淆的问题。以3D金字塔切分注意力替换原有3D卷积层,实现对不同尺度鸟类行为特征的有效捕获。在自建鸟类行为视频数据集上进行实验,对常见鸟类行为的识别准确率达到90.48%,显著优于基准模型与其他现有流行行为识别网络,证明了所提模型对复杂鸟类行为识别的有效性。

关键词: 鸟类行为识别, 多路激励, 金字塔切分注意力, 帧间差分法, 自建数据集

Abstract: Aiming at the problem of low recognition accuracy and high misclassification rate of traditional action recognition methods in dealing with complex bird action patterns, an enhanced deep learning model is proposed. The model integrates a multiple-excitation module and pyramid split attention to improve 3D residual networks, aiming to improve both the accuracy and efficiency of bird action recognition. The inter-frame difference method is utilized to effectively reduce the computational burden while preserving critical spatio-temporal information, thereby improving the recognition accuracy. The introduction of a multiple-excitation module improves the original residual block so that the model can accurately capture subtle motion action features, which solves ambiguities in recognizing complex dynamic actions of birds. Additionally, the original 3D convolutional layer is replaced with 3D pyramid split attention to achieve effective capture of bird action features at different scales. Experiments conducted on a self-built bird action video dataset demonstrate a high recognition accuracy of 90.48%, which significantly outperforms the baseline model and other existing popular action recognition networks. These results confirm that the model can effectively handle the complex bird action recognition task.

Key words: bird action recognition, multiple excitation, pyramid split attention, interframe difference method, self-built dataset

中图分类号: