针对公共场所口罩佩戴检测存在遮挡、密集和小尺度的情况而导致检测精度不高的问题,以实时目标检测算法YOLOv3为基础提出一种Mask-YOLO算法。首先在特征融合过程中引入通道注意力机制以突出重要特征,减少了融合后冗余特征的影响,有效提高了特征利用率;然后以完全交并比(complete intersection over union,CIoU)损失代替均方差损失(mean square error,MSE)作为边框回归的损失函数,提高了定位精度;最后除了检测佩戴和未佩戴口罩的情况外,还对不正确佩戴口罩的情况进行了检测。实验结果表明:与YOLOv3算法相比,Mask-YOLO算法在每秒帧率(frame per second,FPS)仅下降1%的情况下使平均精度均值(mean average precision,mAP)提高了4.78%。与其他主流的目标检测算法相比,Mask-YOLO算法在复杂场景下对口罩佩戴检测也有更好的效果和鲁棒性。
Aiming at the problem of low detection accuracy caused by occlusion, density and small scale in mask wearing detection in public places, a Mask-YOLO algorithm is proposed based on real-time target detection algorithm YOLOv3. First, the algorithm introduces channel attention mechanism in the process of feature fusion, effectively highlights the important features, reduces the influence of redundant features after fusion, and effectively improves the feature utilization. Then, complete intersection over union (CIoU) loss is used instead of mean square error (MSE) as the loss function of frame regression to improve the positioning accuracy. Finally, in addition to the cases of detecting wearing and not wearing masks, incorrect wearing of masks is also detected. Experimental results show that Mask-YOLO algorithm improves mean average precision (mAP) by 4.78% when frame per second (FPS) decreases by only 1% compared with YOLOv3 algorithm. As compared with other mainstream target detection algorithms, Mask-YOLO algorithm also has better detection effect and robustness for mask wearing detection in complex scenes.
[1] Farhadi A, Redmon J. YOLOv3:an incremental improvement[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018:1804-2767.
[2] 曹城硕, 袁杰. 基于YOLO-Mask算法的口罩佩戴检测方法[J]. 激光与光电子学进展, 2021, 58(8):211-218. Cao C S, Yuan J. Mask wearing detection method based on YOLO-Mask algorithm[J]. Laser & Optoelectronics Progress, 2021, 58(8):211-218. (in Chinese)
[3] 张路达, 邓超. 多尺度融合的YOLOv3人群口罩佩戴检测方法[J]. 计算机工程与应用, 2021, 57(16):283-290. Zhang L D, Deng C. Multi-scale fusion of YOLOv3 crowd mask wearing detection method[J]. Computer Engineering and Applications, 2021, 57(16):283-290. (in Chinese)
[4] 曾成, 蒋瑜, 张尹人. 基于改进YOLOv3的口罩佩戴检测方法[J]. 计算机工程与设计, 2021, 42(5):1455-1462. Zeng C, Jiang Y, Zhang Y R. Improved YOLOv3 detection algorithm for mask wearing[J]. Computer Engineering and Design, 2021, 42(5):1455-1462. (in Chinese)
[5] Xu X L, Luo X F, Ma L Y. Context-aware hierarchical feature attention network for multiscale object detection[C]//2020 IEEE International Conference on Image Processing, 2020:2011-2015.
[6] Zheng Z H, Wang P, Liu W, et al. Distance-IoU loss:faster and better learning for bounding box regression[C]//2020 AAAI Conference on Artificial Intelligence, 2020:12993-13000.
[7] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014:580-587.
[8] Girshick R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision, 2015:1440-1448.
[9] Ren S, He K, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[C]//2015 Conference and Workshop on Neural Information Processing Systems, 2015:91-99.
[10] He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision, 2017:2980-2988.
[11] Dai J, Li Y, He K, et al. R-FCN:object detection via region-based fully convolutional networks[C]//2016 Conference and Workshop on Neural Information Processing Systems, 2016:379-387.
[12] Redmon J, Divvala S, Girshick R, et al. You only look once:unifified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016:779-788.
[13] Redmon J, Farhadi A. YOLO9000:better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017:7263-7271.
[14] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4:optimal speed and accuracy of object detection[J/OL]. arXiv preprint arXiv:2004.10934, 2020. (2020-04-23)[2021-10-16] https://arxiv.org/abs/2004.10934.
[15] Liu W, Anguelov D, Erhan D, et al. SSD:single shot multibox detector[C]//2016 European Conference on Computer Vision, 2016:21-37.
[16] Fu C Y, Liu W, Ranga A, et al. DSSD:deconvolutional single shot detector[J/OL]. arXiv preprint arXiv:1701.06659, 2017. (2017-01-23)[2021-10-16]. https://arxiv.org/abs/1701.06659.
[17] Wang K, Liew J H, Zou Y, et al. PANet:few-shot image semantic segmentation with prototype alignment[C]//2019 IEEE/CVF International Conference on Computer Vision, 2019:9197-9206.
[18] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017:936-944.
[19] Yi L, Wang C J, Li F Z, et al. TFPN:twin feature pyramid networks for object detection[C]//2019 International Conference on Tools with Artificial Intelligence, 2019:1702-1707.
[20] Hu J, Li S, Gang S. Squeeze-and-excitation networks[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018:7132-7141.
[21] 卢伟. 基于深度学习的无人机航拍图像目标检测[D]. 厦门:厦门大学信息学院, 2019.
[22] Woo S, Park J, Lee J Y, et al. CBAM:convolutional block attention module[C]//2018 European Conference on Computer Vision, 2018:3-19.
[23] Jiang B R, Luo R X, Mao J Y, et al. Acquisition of localization confidence for accurate object detection[C]//2018 European Conference on Computer Vision, 2018:816-832.
[24] Wang Z Y, Wang G C, Huang B J, et al. Masked face recognition dataset and application[J/OL]. arXiv preprint arXiv:2003.09093, 2020. (2020-03-23)[2021-10-16]. https://arxiv.org/abs/2003.09093.
[25] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO:common objects in context[C]//2014 European Conference on Computer Vision, 2014:740-755.