基于Mask-YOLO的复杂场景口罩佩戴检测

doi:10.3969/j.issn.0255-8297.2022.01.009

应用科学学报 ›› 2022, Vol. 40 ›› Issue (1): 93-104.doi: 10.3969/j.issn.0255-8297.2022.01.009

基于Mask-YOLO的复杂场景口罩佩戴检测

魏明军^1,2, 周太宇¹, 纪占林^1,2, 张鑫楠¹

1. 华北理工大学人工智能学院, 河北唐山 063210;
2. 华北理工大学河北省工业智能感知重点实验室, 河北唐山 063210

收稿日期:2021-10-27 出版日期:2022-01-28 发布日期:2022-01-28
通信作者: 魏明军,教授,研究方向为图像处理、信息安全技术。E-mail:109849249@qq.com E-mail:109849249@qq.com
基金资助:
科技部重点研发项目基金（No.2017YFE0135700）资助

Mask Wearing Detection in Complex Scenes Based on Mask-YOLO

WEI Mingjun^1,2, ZHOU Taiyu¹, JI Zhanlin^1,2, ZHANG Xinnan¹

1. College of Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, Hebei, China;
2. Hebei Provincial Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, Hebei, China

Received:2021-10-27 Online:2022-01-28 Published:2022-01-28

摘要/Abstract

摘要： 针对公共场所口罩佩戴检测存在遮挡、密集和小尺度的情况而导致检测精度不高的问题，以实时目标检测算法YOLOv3为基础提出一种Mask-YOLO算法。首先在特征融合过程中引入通道注意力机制以突出重要特征，减少了融合后冗余特征的影响，有效提高了特征利用率；然后以完全交并比（complete intersection over union，CIoU）损失代替均方差损失（mean square error，MSE）作为边框回归的损失函数，提高了定位精度；最后除了检测佩戴和未佩戴口罩的情况外，还对不正确佩戴口罩的情况进行了检测。实验结果表明：与YOLOv3算法相比，Mask-YOLO算法在每秒帧率（frame per second，FPS）仅下降1%的情况下使平均精度均值（mean average precision，mAP）提高了4.78%。与其他主流的目标检测算法相比，Mask-YOLO算法在复杂场景下对口罩佩戴检测也有更好的效果和鲁棒性。

关键词: 口罩佩戴检测, Mask-YOLO, 注意力机制, 特征融合, 损失函数

Abstract: Aiming at the problem of low detection accuracy caused by occlusion, density and small scale in mask wearing detection in public places, a Mask-YOLO algorithm is proposed based on real-time target detection algorithm YOLOv3. First, the algorithm introduces channel attention mechanism in the process of feature fusion, effectively highlights the important features, reduces the influence of redundant features after fusion, and effectively improves the feature utilization. Then, complete intersection over union (CIoU) loss is used instead of mean square error (MSE) as the loss function of frame regression to improve the positioning accuracy. Finally, in addition to the cases of detecting wearing and not wearing masks, incorrect wearing of masks is also detected. Experimental results show that Mask-YOLO algorithm improves mean average precision (mAP) by 4.78% when frame per second (FPS) decreases by only 1% compared with YOLOv3 algorithm. As compared with other mainstream target detection algorithms, Mask-YOLO algorithm also has better detection effect and robustness for mask wearing detection in complex scenes.

Key words: mask wearing detection, Mask-YOLO, attention mechanism, feature fusion, loss function

中图分类号:

TP391.41

魏明军, 周太宇, 纪占林, 张鑫楠. 基于Mask-YOLO的复杂场景口罩佩戴检测[J]. 应用科学学报, 2022, 40(1): 93-104.

WEI Mingjun, ZHOU Taiyu, JI Zhanlin, ZHANG Xinnan. Mask Wearing Detection in Complex Scenes Based on Mask-YOLO[J]. Journal of Applied Sciences, 2022, 40(1): 93-104.

参考文献

[1] Farhadi A, Redmon J. YOLOv3:an incremental improvement[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018:1804-2767.
[2] 曹城硕, 袁杰. 基于YOLO-Mask算法的口罩佩戴检测方法[J]. 激光与光电子学进展, 2021, 58(8):211-218. Cao C S, Yuan J. Mask wearing detection method based on YOLO-Mask algorithm[J]. Laser & Optoelectronics Progress, 2021, 58(8):211-218. (in Chinese)
[3] 张路达, 邓超. 多尺度融合的YOLOv3人群口罩佩戴检测方法[J]. 计算机工程与应用, 2021, 57(16):283-290. Zhang L D, Deng C. Multi-scale fusion of YOLOv3 crowd mask wearing detection method[J]. Computer Engineering and Applications, 2021, 57(16):283-290. (in Chinese)
[4] 曾成, 蒋瑜, 张尹人. 基于改进YOLOv3的口罩佩戴检测方法[J]. 计算机工程与设计, 2021, 42(5):1455-1462. Zeng C, Jiang Y, Zhang Y R. Improved YOLOv3 detection algorithm for mask wearing[J]. Computer Engineering and Design, 2021, 42(5):1455-1462. (in Chinese)
[5] Xu X L, Luo X F, Ma L Y. Context-aware hierarchical feature attention network for multiscale object detection[C]//2020 IEEE International Conference on Image Processing, 2020:2011-2015.
[6] Zheng Z H, Wang P, Liu W, et al. Distance-IoU loss:faster and better learning for bounding box regression[C]//2020 AAAI Conference on Artificial Intelligence, 2020:12993-13000.
[7] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014:580-587.
[8] Girshick R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision, 2015:1440-1448.
[9] Ren S, He K, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[C]//2015 Conference and Workshop on Neural Information Processing Systems, 2015:91-99.
[10] He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision, 2017:2980-2988.
[11] Dai J, Li Y, He K, et al. R-FCN:object detection via region-based fully convolutional networks[C]//2016 Conference and Workshop on Neural Information Processing Systems, 2016:379-387.
[12] Redmon J, Divvala S, Girshick R, et al. You only look once:unifified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016:779-788.
[13] Redmon J, Farhadi A. YOLO9000:better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017:7263-7271.
[14] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4:optimal speed and accuracy of object detection[J/OL]. arXiv preprint arXiv:2004.10934, 2020. (2020-04-23)[2021-10-16] https://arxiv.org/abs/2004.10934.
[15] Liu W, Anguelov D, Erhan D, et al. SSD:single shot multibox detector[C]//2016 European Conference on Computer Vision, 2016:21-37.
[16] Fu C Y, Liu W, Ranga A, et al. DSSD:deconvolutional single shot detector[J/OL]. arXiv preprint arXiv:1701.06659, 2017. (2017-01-23)[2021-10-16]. https://arxiv.org/abs/1701.06659.
[17] Wang K, Liew J H, Zou Y, et al. PANet:few-shot image semantic segmentation with prototype alignment[C]//2019 IEEE/CVF International Conference on Computer Vision, 2019:9197-9206.
[18] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017:936-944.
[19] Yi L, Wang C J, Li F Z, et al. TFPN:twin feature pyramid networks for object detection[C]//2019 International Conference on Tools with Artificial Intelligence, 2019:1702-1707.
[20] Hu J, Li S, Gang S. Squeeze-and-excitation networks[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018:7132-7141.
[21] 卢伟. 基于深度学习的无人机航拍图像目标检测[D]. 厦门:厦门大学信息学院, 2019.
[22] Woo S, Park J, Lee J Y, et al. CBAM:convolutional block attention module[C]//2018 European Conference on Computer Vision, 2018:3-19.
[23] Jiang B R, Luo R X, Mao J Y, et al. Acquisition of localization confidence for accurate object detection[C]//2018 European Conference on Computer Vision, 2018:816-832.
[24] Wang Z Y, Wang G C, Huang B J, et al. Masked face recognition dataset and application[J/OL]. arXiv preprint arXiv:2003.09093, 2020. (2020-03-23)[2021-10-16]. https://arxiv.org/abs/2003.09093.
[25] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO:common objects in context[C]//2014 European Conference on Computer Vision, 2014:740-755.

基于Mask-YOLO的复杂场景口罩佩戴检测

Mask Wearing Detection in Complex Scenes Based on Mask-YOLO

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

编辑推荐

Metrics

本文评价

[1]	雷前慧, 潘丽丽, 邵伟志, 胡海鹏, 黄瑶. 基于三重注意力机制的新冠肺炎病灶分割模型[J]. 应用科学学报, 2022, 40(1): 105-115.
[2]	范守祥, 姚俊萍, 李晓军, 程开原. 一种多模特征融合的方面信息情感分类方法[J]. 应用科学学报, 2021, 39(6): 969-982.
[3]	彭宁, 陈爱斌, 周国雄, 陈文洁, 刘晶. 基于正弦注意力表征网络的环境声音识别[J]. 应用科学学报, 2021, 39(4): 641-649.
[4]	王胜科, 任鹏飞, 吕昕, 庄新发. 基于中心点和双重注意力机制的无人机高分辨率图像小目标检测算法[J]. 应用科学学报, 2021, 39(4): 650-659.
[5]	靳华中, 刘潇龙, 胡梓珂. 一种结合全局和局部特征的图像描述生成模型[J]. 应用科学学报, 2019, 37(4): 501-509.
[6]	刘立昕，卞红雨. 用于水下目标跟踪的多特征融合PSOPF 算法[J]. 应用科学学报, 2013, 31(6): 564-568.
[7]	李侃1,2，平西建1. 基于图像内容和特征融合的隐写盲检测[J]. 应用科学学报, 2013, 31(1): 97-103.