Infrared camera images of leopards in natural environments pose significant challenges for individual recognition due to issues such as high fusion between individuals and their surroundings, as well as high inter-class similarity. To address these challenges, an improved EfficientNet model is proposed, incorporating self-calibrating convolution and bilevel routing attention. The self-calibrating convolution adaptively builds remote space and inter-channel dependencies around each spatial location. The ability to recognize detailed features is enhanced by explicitly combining richer contextual information. This effectively mitigates the recognition challenges posed by high inter-class similarity. Meanwhile, the bilevel routing attention combines the top-down global attention strategy and the bottom-up local attention strategy to solve the problem of high integration between individuals and their environment. Experiment results show that the accuracy of the proposed model reaches 95.56% in the task of leopard individual recognition, which is significantly higher than the original EfficientNet. These findings validate the effectiveness and superiority of the proposed model in dealing with leopard individual recognition task.
[1] 肖治术, 李欣海, 王学志, 等. 探讨我国森林野生动物红外相机监测规范[J]. 生物多样性, 2014, 22(6): 704-711. Xiao Z S, Li X H, Wang X Z, et al. Developing camera-trapping protocols for wildlife monitoring in Chinese forests [J]. Biodiversity Science, 2014, 22(6): 704-711. (in Chinese)
[2] 钟俊杰, 钮冰, 陈沁, 等. 深度学习在野生动物保护中的应用[J]. 兽类学报, 2023, 43(6): 734-744. Zhong J J, Niu B, Chen Q, et al. Application of deep learning in wildlife conservation [J]. Acta Theriologica Sinica, 2023, 43(6): 734-744. (in Chinese)
[3] Dwivedi Y K, Hughes L, Ismagilova E, et al. Artificial intelligence (AI): multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy [J]. International Journal of Information Management, 2021, 57: 101994.
[4] Jordan M I, Mitchell T M. Machine learning: trends, perspectives, and prospects [J]. Science, 2015, 349(6245): 255-260.
[5] 赵婷婷, 周哲峰, 李东喜, 等. 基于改进的Cifar-10深度学习模型的金钱豹个体识别研究[J]. 太原理工大学学报, 2018, 49(4): 585-591, 598. Zhao T T, Zhou Z F, Li D X, et al. Individual identification of leopard based on improved Cifar-10 deep learning model [J]. Journal of Taiyuan University of Technology, 2018, 49(4): 585-591, 598. (in Chinese)
[6] Zeng D, Veldhuis R, Spreeuwers L. A survey of face recognition techniques under occlusion [J]. IET Biometrics, 2021, 10(6): 581-606.
[7] Wang P, Fan E, Wang P. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning [J]. Pattern Recognition Letters, 2021, 141: 61-67.
[8] Bernal J, Kushibar K, Asfaw D S, et al. Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review [J]. Artificial Intelligence in Medicine, 2019, 95: 64-81.
[9] Si C Y, Chen W T, Wang W, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 1227-1236.
[10] Esteva A, Chou K, Yeung S, et al. Deep learning-enabled medical computer vision [J]. NPJ Digital Medicine, 2021, 4(1): 5.
[11] Cha S, Lim J, Kim K, et al. Deepening the accuracy of tree species classification: a deep learning-based methodology [J]. Forests, 2023, 14(8): 1602.
[12] Atila Ü, Uçar M, Akyol K, et al. Plant leaf disease classification using EfficientNet deep learning model [J]. Ecological Informatics, 2021, 61: 101182.
[13] Liu J J, Hou Q, Cheng M M, et al. Improving convolutional networks with self-calibrated convolutions [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10093-10102.
[14] Zhu L, Wang X J, Ke Z H, et al. BiFormer: vision transformer with bi-level routing attention [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10323-10333.
[15] Lee T, Na Y, Kim B G, et al. Identification of individual Hanwoo cattle by muzzle pattern images through deep learning [J]. Animals, 2023, 13(18): 2856.
[16] Takaya K, Taguchi Y, Ise T. Individual identification of endangered amphibians using deep learning and smartphone images: case study of the Japanese giant salamander (Andrias japonicus) [J]. Scientific Reports, 2023, 13(1): 16212.
[17] Luo C Y, Cheng S Y, Xu H, et al. Human behavior recognition model based on improved EfficientNet [J]. Procedia Computer Science, 2022, 199: 369-376.
[18] Zhu X Z, Cheng D Z, Zhang Z, et al. An empirical study of spatial attention mechanisms in deep networks [C]//2019 IEEE/CVF International Conference on Computer Vision, 2019: 6687-6696.
[19] Scherer D, Müller A, Behnke S. Evaluation of pooling operations in convolutional architectures for object recognition [C]//International Conference on Artificial Neural Networks, 2010: 92-101.
[20] 宋大昭, 王卜平, 蒋进原, 等. 山西晋中庆城林场华北豹及其主要猎物种群的红外相机监测[J]. 生物多样性, 2014, 22(6): 733-736. Song D Z, Wang B P, Jiang J Y, et al. Using camera trap to monitor a North Chinese leopard (Panthera pardus japonesis) population and their main ungulate prey [J]. Biodiversity Science, 2014, 22(6): 733-736. (in Chinese)
[21] Mathis A, Biasi T, Schneider S, et al. Pretraining boosts out-of-domain robustness for pose estimation [C]//IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 1858-1867.
[22] Song H, Kim M, Lee J G. SELFIE: refurbishing unclean samples for robust deep learning [C]//International Conference on Machine Learning, 2019: 5907-5915.