应用科学学报 ›› 2025, Vol. 43 ›› Issue (1): 154-168.doi: 10.3969/j.issn.0255-8297.2025.01.011
邓抒憧, 陈爱斌, 戴子健
收稿日期:
2024-07-10
出版日期:
2025-01-30
发布日期:
2025-01-24
通信作者:
陈爱斌,教授,研究方向为深度学习、音频处理、生态人工智能应用。E-mail:hotaibin@163.com
E-mail:hotaibin@163.com
基金资助:
DENG Shuchong, CHEN Aibin, DAI Zijian
Received:
2024-07-10
Online:
2025-01-30
Published:
2025-01-24
摘要: 针对传统行为识别方法在处理复杂鸟类行为模式时存在辨识率低、误判率高等问题,提出了一种基于多路激励模块和金字塔切分注意力的改进3D残差网络的深度学习模型。利用帧间差分法有效减轻计算负担,在精确保留关键时空信息的同时提高了识别精度。引入多路激励模块改进原有残差块,使模型能够精准捕捉细微运动行为特征,解决了鸟类复杂动态行为识别易混淆的问题。以3D金字塔切分注意力替换原有3D卷积层,实现对不同尺度鸟类行为特征的有效捕获。在自建鸟类行为视频数据集上进行实验,对常见鸟类行为的识别准确率达到90.48%,显著优于基准模型与其他现有流行行为识别网络,证明了所提模型对复杂鸟类行为识别的有效性。
中图分类号:
邓抒憧, 陈爱斌, 戴子健. 基于多路激励和金字塔切分注意力的鸟类行为识别[J]. 应用科学学报, 2025, 43(1): 154-168.
DENG Shuchong, CHEN Aibin, DAI Zijian. Bird Action Recognition Based on Multiple Excitation and Pyramid Split Attention[J]. Journal of Applied Sciences, 2025, 43(1): 154-168.
[1] Alvarenga F A P, Borges I, Palkovi L, et al. Using a three-axis accelerometer to identify and classify sheep behaviour at pasture [J]. Applied Animal Behaviour Science, 2016, 181: 91-99. [2] Bernal J, Kushibar K, Asfaw D S, et al. Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review [J]. Artificial Intelligence in Medicine, 2019, 95: 64-81. [3] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos [C]//28th Conference on Neural Information Processing Systems, 2014: 568-576. [4] Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks [C]//IEEE/CVF International Conference on Computer Vision, 2015: 4489-4497. [5] Zhang K, Sun M, Han T X, et al. Residual networks of residual networks: multilevel residual networks [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(6): 1303-1314. [6] Carreira J, Zisserman A, Quo V. Action recognition? A new model and the kinetics dataset [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 4724-4733. [7] Feichtenhofer C, Fan H Q, Malik J, et al. SlowFast networks for video recognition [C]// IEEE/CVF International Conference on Computer Vision, 2019: 6202-6211. [8] Lin J, Gan C, Han S. TSM: temporal shift module for efficient video understanding [C]// IEEE/CVF International Conference on Computer Vision, 2019: 7083-7093. [9] Kalfaoglu M E, Kalkan S, Alatan A A. Late temporal modeling in 3D CNN architectures with BERT for action recognition [C]//Computer Vision-ECCV 2020 Workshops, 2020: 731-747. [10] Fuentes A, Yoon S, Park J, et al. Deep learning-based hierarchical cattle behavior recognition with spatio-temporal information [J]. Computers and Electronics in Agriculture, 2020, 177: 105627. [11] Nasirahmadi A, Sturm B, Edwards S, et al. Deep learning and machine vision approaches for posture detection of individual pigs [J]. Sensors, 2019, 19(17): 3738. [12] Feng L, Zhao Y, Sun Y, et al. Action recognition using a spatial-temporal network for wild felines [J]. Animals, 2021, 11(2): 485. [13] Schindler F, Steinhage V. Identification of animals and recognition of their actions in wildlife videos using deep learning techniques [J]. Ecological Informatics, 2021, 61: 101215. [14] Tran D, Wang H, Torresani L, et al. A closer look at spatiotemporal convolutions for action recognition [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6450-6459. [15] 王春清, 王悦涛, 尚书旗, 等. 基于YOLOv5x的鸡只基本行为识别方法研究[J]. 农业装备与车辆工程, 2024, 62(4): 1-5. Wang C Q, Wang Y T, Shang S Q, et al. Research on chicken basic behavior recognition method based on YOLOv5x [J]. Agricultural Equipment & Vehicle Engineering, 2024, 62(4): 1-5. (in Chinese) [16] 袁洪波, 曹润柳, 程曼. 融合Res3D、 BiLSTM和注意力机制的羊只行为识别方法[J]. 农业机械学报, 2024, 55(4): 221-230. Yuan H B, Cao R L, Cheng M. Fusion of Res3D, BiLSTM and attention mechanism for sheep behavior recognition method [J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55(4): 221-230. (in Chinese) [17] 杜妍茹. 基于计算机视觉的牛日常行为识别研究[D]. 包头: 内蒙古科技大学, 2023. [18] Du Y H, Zhao Z C, Song Y, et al. StrongSORT: make DeepSORT great again [J]. IEEE Transactions on Multimedia, 2023, 25: 8725-8737. [19] Li C. Dangerous posture monitoring for undersea diver based on frame difference method [J]. Journal of Coastal Research, 2020, 103(S1): 939-942. [20] Wang Z, She Q, Smolic A. Action-net: multipath excitation for action recognition [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13214-13223. [21] Zhang H, Zu K K, Lu J, et al. EPSANet: an efficient pyramid squeeze attention block on convolutional neural network [C]//Asian Conference on Computer Vision, 2022: 1161-1177. [22] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141. [23] Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13713-13722. [24] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module [C]//European Conference on Computer Vision, 2018: 3-19. [25] Huang Z L, Wang X G, Huang L C, et al. CCNET: criss-cross attention for semantic segmentation [C]//IEEE/CVF International Conference on Computer Vision, 2019: 603-612. [26] Wang X L, Girshick R, Gupta A, et al. Non-local neural networks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7794-7803. |
[1] | 贺加贝, 周菊香, 甘健侯, 吴迪, 温晓宇. 基于多任务学习的课堂表情分类模型[J]. 应用科学学报, 2024, 42(6): 947-961. |
[2] | 栗莎, 王永雄, 王哲, 陈旭, 何嘉欣. 融合局部和全局特征的铸件缺陷检测[J]. 应用科学学报, 2024, 42(5): 757-768. |
[3] | 华怡坦, 黄影平, 过文昊. 基于CNN和Transformer点云图像融合的道路检测[J]. 应用科学学报, 2024, 42(4): 695-708. |
[4] | 金彦亮, 吴筱. 基于双流自适应时空增强图卷积网络的手语识别[J]. 应用科学学报, 2024, 42(2): 189-199. |
[5] | 崔帅华, 余磊, 何茜, 熊邦书, 欧巧凤. 一种大视场汇聚型双目立体视觉标定方法[J]. 应用科学学报, 2024, 42(2): 269-279. |
[6] | 李绍骞, 程鑫, 周经美, 赵祥模. 基于车辆外观特征和帧间光流的目标跟踪算法[J]. 应用科学学报, 2024, 42(1): 103-118. |
[7] | 周啸辉, 余磊, 张睿婷, 熊邦书, 欧巧凤. 基于SASK和双分支结构的服装图像识别方法[J]. 应用科学学报, 2023, 41(6): 967-977. |
[8] | 熊娟, 张孙杰, 阚亚亚, 陈家豪. 基于CAFPN和细化双头解耦的遥感图像目标检测[J]. 应用科学学报, 2023, 41(6): 989-1003. |
[9] | 李伟汉, 侯北平, 胡飞阳, 朱必宏. 阿尔茨海默症的多模态分类方法[J]. 应用科学学报, 2023, 41(6): 1004-1018. |
[10] | 阚亚亚, 张孙杰, 熊娟, 祖奕. 结合transformer多尺度实例交互的稀疏集目标检测[J]. 应用科学学报, 2023, 41(5): 777-788. |
[11] | 王辉, 丁铂栩. 三维点云表示的人体动作序列预测[J]. 应用科学学报, 2023, 41(3): 461-475. |
[12] | 萧晓彤, 丁建伟, 张琪. 基于局部和全局梯度上升的分段后门防御[J]. 应用科学学报, 2023, 41(2): 218-227. |
[13] | 徐增敏, 陆光建, 陈俊彦, 陈金龙, 丁勇. 基于通道特征聚合的行人重识别算法[J]. 应用科学学报, 2023, 41(1): 107-120. |
[14] | 邹倩颖, 陈晖阳, 李永生, 胡力雯, 王小芳. 粒子群优化的深海图像暗边缘检测优化算法[J]. 应用科学学报, 2023, 41(1): 153-169. |
[15] | 聂江华, 肖永生, 黄丽贞, 贺丰收. 基于时频分析与深度学习的高分辨距离像雷达目标识别[J]. 应用科学学报, 2022, 40(6): 973-983. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||