计算机科学与应用

基于双层路由注意力和自校准卷积的豹个体识别

展开
  • 1. 中南林业科技大学 人工智能应用研究所, 湖南 长沙 410004;
    2. 中国猫科动物保护联盟, 北京 100875;
    3. 中国科学院动物研究所, 北京 100101

收稿日期: 2024-07-11

  网络出版日期: 2025-04-03

基金资助

国家自然科学基金(No.62276276);湖南省自然科学基金(No.2024JJ5647)资助

Leopards Individual Recognition Based on Bi-level Routing Attention and Self-Calibrated Convolution

Expand
  • 1. Institute of Artificial Intelligence Application, Central South University of Forestry and Technology, Changsha 410004, Hunan, China;
    2. Chinese Felid Conservation Alliance, Beijing 100875, China;
    3. Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China

Received date: 2024-07-11

  Online published: 2025-04-03

摘要

自然环境中豹的图像在用于个体识别任务时,个体与环境融合度高、类间相似性高这两个因素会导致识别困难,为此结合自校准卷积和双层路由注意力,提出了一种改进的EfficientNet模型。自校准卷积能够自适应地在每个空间位置周围构建远程空间和通道间的依赖关系,并显式地结合更丰富的信息来增强对细节特征的识别能力,解决了类间相似性高带来的识别难题。双层路由注意力结合自顶向下的全局注意力和自底向上的局部注意力,解决了个体与环境融合度高的问题。实验结果显示,改进后的模型在豹个体识别任务上的准确率达到了95.56%,显著高于原始的EfficientNet模型,证明了所提出的模型在处理豹个体识别任务上的有效性和先进性。

本文引用格式

杨婉, 陈爱斌, 赵莹, 武阅, 甑鑫, 肖治术 . 基于双层路由注意力和自校准卷积的豹个体识别[J]. 应用科学学报, 2025 , 43(2) : 348 -360 . DOI: 10.3969/j.issn.0255-8297.2025.02.012

Abstract

Infrared camera images of leopards in natural environments pose significant challenges for individual recognition due to issues such as high fusion between individuals and their surroundings, as well as high inter-class similarity. To address these challenges, an improved EfficientNet model is proposed, incorporating self-calibrating convolution and bilevel routing attention. The self-calibrating convolution adaptively builds remote space and inter-channel dependencies around each spatial location. The ability to recognize detailed features is enhanced by explicitly combining richer contextual information. This effectively mitigates the recognition challenges posed by high inter-class similarity. Meanwhile, the bilevel routing attention combines the top-down global attention strategy and the bottom-up local attention strategy to solve the problem of high integration between individuals and their environment. Experiment results show that the accuracy of the proposed model reaches 95.56% in the task of leopard individual recognition, which is significantly higher than the original EfficientNet. These findings validate the effectiveness and superiority of the proposed model in dealing with leopard individual recognition task.

参考文献

[1] 肖治术, 李欣海, 王学志, 等. 探讨我国森林野生动物红外相机监测规范[J]. 生物多样性, 2014, 22(6): 704-711. Xiao Z S, Li X H, Wang X Z, et al. Developing camera-trapping protocols for wildlife monitoring in Chinese forests [J]. Biodiversity Science, 2014, 22(6): 704-711. (in Chinese)
[2] 钟俊杰, 钮冰, 陈沁, 等. 深度学习在野生动物保护中的应用[J]. 兽类学报, 2023, 43(6): 734-744. Zhong J J, Niu B, Chen Q, et al. Application of deep learning in wildlife conservation [J]. Acta Theriologica Sinica, 2023, 43(6): 734-744. (in Chinese)
[3] Dwivedi Y K, Hughes L, Ismagilova E, et al. Artificial intelligence (AI): multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy [J]. International Journal of Information Management, 2021, 57: 101994.
[4] Jordan M I, Mitchell T M. Machine learning: trends, perspectives, and prospects [J]. Science, 2015, 349(6245): 255-260.
[5] 赵婷婷, 周哲峰, 李东喜, 等. 基于改进的Cifar-10深度学习模型的金钱豹个体识别研究[J]. 太原理工大学学报, 2018, 49(4): 585-591, 598. Zhao T T, Zhou Z F, Li D X, et al. Individual identification of leopard based on improved Cifar-10 deep learning model [J]. Journal of Taiyuan University of Technology, 2018, 49(4): 585-591, 598. (in Chinese)
[6] Zeng D, Veldhuis R, Spreeuwers L. A survey of face recognition techniques under occlusion [J]. IET Biometrics, 2021, 10(6): 581-606.
[7] Wang P, Fan E, Wang P. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning [J]. Pattern Recognition Letters, 2021, 141: 61-67.
[8] Bernal J, Kushibar K, Asfaw D S, et al. Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review [J]. Artificial Intelligence in Medicine, 2019, 95: 64-81.
[9] Si C Y, Chen W T, Wang W, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 1227-1236.
[10] Esteva A, Chou K, Yeung S, et al. Deep learning-enabled medical computer vision [J]. NPJ Digital Medicine, 2021, 4(1): 5.
[11] Cha S, Lim J, Kim K, et al. Deepening the accuracy of tree species classification: a deep learning-based methodology [J]. Forests, 2023, 14(8): 1602.
[12] Atila Ü, Uçar M, Akyol K, et al. Plant leaf disease classification using EfficientNet deep learning model [J]. Ecological Informatics, 2021, 61: 101182.
[13] Liu J J, Hou Q, Cheng M M, et al. Improving convolutional networks with self-calibrated convolutions [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10093-10102.
[14] Zhu L, Wang X J, Ke Z H, et al. BiFormer: vision transformer with bi-level routing attention [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10323-10333.
[15] Lee T, Na Y, Kim B G, et al. Identification of individual Hanwoo cattle by muzzle pattern images through deep learning [J]. Animals, 2023, 13(18): 2856.
[16] Takaya K, Taguchi Y, Ise T. Individual identification of endangered amphibians using deep learning and smartphone images: case study of the Japanese giant salamander (Andrias japonicus) [J]. Scientific Reports, 2023, 13(1): 16212.
[17] Luo C Y, Cheng S Y, Xu H, et al. Human behavior recognition model based on improved EfficientNet [J]. Procedia Computer Science, 2022, 199: 369-376.
[18] Zhu X Z, Cheng D Z, Zhang Z, et al. An empirical study of spatial attention mechanisms in deep networks [C]//2019 IEEE/CVF International Conference on Computer Vision, 2019: 6687-6696.
[19] Scherer D, Müller A, Behnke S. Evaluation of pooling operations in convolutional architectures for object recognition [C]//International Conference on Artificial Neural Networks, 2010: 92-101.
[20] 宋大昭, 王卜平, 蒋进原, 等. 山西晋中庆城林场华北豹及其主要猎物种群的红外相机监测[J]. 生物多样性, 2014, 22(6): 733-736. Song D Z, Wang B P, Jiang J Y, et al. Using camera trap to monitor a North Chinese leopard (Panthera pardus japonesis) population and their main ungulate prey [J]. Biodiversity Science, 2014, 22(6): 733-736. (in Chinese)
[21] Mathis A, Biasi T, Schneider S, et al. Pretraining boosts out-of-domain robustness for pose estimation [C]//IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 1858-1867.
[22] Song H, Kim M, Lee J G. SELFIE: refurbishing unclean samples for robust deep learning [C]//International Conference on Machine Learning, 2019: 5907-5915.
文章导航

/