This paper proposes a text detection model based on mask region convolution neural network (Mask R-CNN). Firstly, the model optimizes the bottleneck structure of residual networks from the perspective of expanding the receptive field of the model and maintaining the efficiency of the model as much as possible, and proposes a residual network based on structural optimization (ResNetSO). Then for removing redundant features and improving the quality of fused features, the model generates a feature pyramid network based on lower feature guidance (FPNetLFG) by applying spatial attention mechanism to feature pyramid network. Finally, experimental results on two data sets show that as applying the proposed model, which consists of ResNetSO and FPNetLFG modules, in cascade region convolution neural network (Cascade R-CNN) and detecting objects with recursive feature pyramid and switchable atrous convolution (DetectoRS), F1 value can be improved by 0.8% and 0.3%, respectively, which verifies the effectiveness and universal applicability of this method.
ZHAO Xiaowei, JI Minghui, XU Xiujuan, SHEN Jiale
. Text Detection Model Based on Mask Region Convolution Neural Network[J]. Journal of Applied Sciences, 2023
, 41(3)
: 527
-540
.
DOI: 10.3969/j.issn.0255-8297.2023.03.013
[1] He K, Gkioxari G, Dollár P, et al. Mask R-CNN [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397.
[2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 1492-1500.
[3] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7132-7141.
[4] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 2117-2125.
[5] Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 8759-8768.
[6] Tan M, Pang R, Le Q V. EfficientDet: scalable and efficient object detection [C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020: 10781-10790.
[7] Picron C, Tuytelaars T. Trident pyramid networks: the importance of processing at the feature pyramid level for better object detection [J/OL] (2021-10-08) [2022-5-30]. https://arXiv:2110.04004.
[8] Gao S H, Cheng M M, Zhao K, et al. Res2Net: a new multi-scale backbone architecture [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(2): 652-662.
[9] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 770-778.
[10] Cai Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 6154-6162.
[11] Chng C K, Liu Y, Sun Y, et al. ICDAR2019 robust reading challenge on arbitrary-shaped textRRC-art [C]//Proceedings of the 15th IEEE International Conference on Document Analysis and Recognition, Sydney, Australia, 2019: 1571-1576.
[12] Ch’ng C K, Chan C S. Total-text: a comprehensive dataset for scene text detection and recognition [C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Japan, 2017, 1: 935-942.
[13] Liu Y, Jin L, Zhang S, et al. Curved scene text detection via transverse and longitudinal sequence connection [J]. Pattern Recognition, 2019, 90: 337-345.
[14] Chen K, Wang J, Pang J, et al. MMDetection: open mmlab detection toolbox and benchmark [J/OL]. (2019-06-17) [2022-05-30]. http://arXiv:1906.07155.
[15] Zhang H, Wu C, Zhang Z, et al. ResNeSt: split-attention networks [C]//2022 IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022: 2735-2745.
[16] Robbins H, Monro S. A stochastic approximation method [J]. The Annals of Mathematical Statistics, 1951: 400-407.
[17] Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database [C]//2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009: 248-255.