应用掩码区域卷积神经网络的文本检测模型

赵小薇, 季明辉, 徐秀娟, 沈家乐

doi:10.3969/j.issn.0255-8297.2023.03.013

应用科学学报 >

2023 , Vol. 41 >Issue 3: 527 - 540

DOI: https://doi.org/10.3969/j.issn.0255-8297.2023.03.013

计算机科学与应用

应用掩码区域卷积神经网络的文本检测模型

展开

1. 大连理工大学软件学院, 辽宁大连 116620;
2. 大连理工大学辽宁省泛在网络与服务软件重点实验室, 辽宁大连 116620

收稿日期: 2022-06-30

网络出版日期: 2023-06-16

基金资助

国家自然科学基金(No. 61672128)资助

收起

Text Detection Model Based on Mask Region Convolution Neural Network

Expand

1. School of Software Technology, Dalian University of Technology, Dalian 116620, Liaoning, China;
2. Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian 116620, Liaoning, China

Received date: 2022-06-30

Online published: 2023-06-16

Fold

摘要

提出一种基于掩码区域卷积神经网络的文本检测模型。首先从扩大模型感受野并尽可能保持模型效率的角度出发，针对残差神经网络中的瓶颈结构进行优化，构建基于结构优化的残差神经网络（residual network based on structural optimization, ResNetSO）；然后去除冗余特征以提高融合后特征质量，并将空间注意力机制应用于特征金字塔网络，构建了基于下层特征指导的特征金字塔网络（feature pyramid network based on lower feature guidance,FPNetLFG）。在两个公开数据集上的实验结果表明：包含 ResNetSO 和 FPNetLFG 两个模块的模型应用在级联区域卷积神经网络、递归特征金字塔和可切换空洞卷积的目标检测模型中，分别可以带来 0.8% 和 0.3% 左右的 F1 值提升，从而说明了该方法的有效性和普遍适用性。

关键词： 文本检测; 掩码区域卷积神经网络; 主干网络; 结构优化; 特征金字塔网络

本文引用格式

赵小薇, 季明辉, 徐秀娟, 沈家乐 . 应用掩码区域卷积神经网络的文本检测模型[J]. 应用科学学报, 2023 , 41(3) : 527 -540 . DOI: 10.3969/j.issn.0255-8297.2023.03.013

Abstract

This paper proposes a text detection model based on mask region convolution neural network (Mask R-CNN). Firstly, the model optimizes the bottleneck structure of residual networks from the perspective of expanding the receptive field of the model and maintaining the efficiency of the model as much as possible, and proposes a residual network based on structural optimization (ResNetSO). Then for removing redundant features and improving the quality of fused features, the model generates a feature pyramid network based on lower feature guidance (FPNetLFG) by applying spatial attention mechanism to feature pyramid network. Finally, experimental results on two data sets show that as applying the proposed model, which consists of ResNetSO and FPNetLFG modules, in cascade region convolution neural network (Cascade R-CNN) and detecting objects with recursive feature pyramid and switchable atrous convolution (DetectoRS), F1 value can be improved by 0.8% and 0.3%, respectively, which verifies the effectiveness and universal applicability of this method.

Key words： text detection; mask region convolution neural network (Mask R-CNN); backbone network; structural optimization; feature pyramid network

参考文献

[1] He K, Gkioxari G, Dollár P, et al. Mask R-CNN [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397.
[2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 1492-1500.
[3] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7132-7141.
[4] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 2117-2125.
[5] Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 8759-8768.
[6] Tan M, Pang R, Le Q V. EfficientDet: scalable and efficient object detection [C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020: 10781-10790.
[7] Picron C, Tuytelaars T. Trident pyramid networks: the importance of processing at the feature pyramid level for better object detection [J/OL] (2021-10-08) [2022-5-30]. https://arXiv:2110.04004.
[8] Gao S H, Cheng M M, Zhao K, et al. Res2Net: a new multi-scale backbone architecture [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(2): 652-662.
[9] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 770-778.
[10] Cai Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 6154-6162.
[11] Chng C K, Liu Y, Sun Y, et al. ICDAR2019 robust reading challenge on arbitrary-shaped textRRC-art [C]//Proceedings of the 15th IEEE International Conference on Document Analysis and Recognition, Sydney, Australia, 2019: 1571-1576.
[12] Ch’ng C K, Chan C S. Total-text: a comprehensive dataset for scene text detection and recognition [C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Japan, 2017, 1: 935-942.
[13] Liu Y, Jin L, Zhang S, et al. Curved scene text detection via transverse and longitudinal sequence connection [J]. Pattern Recognition, 2019, 90: 337-345.
[14] Chen K, Wang J, Pang J, et al. MMDetection: open mmlab detection toolbox and benchmark [J/OL]. (2019-06-17) [2022-05-30]. http://arXiv:1906.07155.
[15] Zhang H, Wu C, Zhang Z, et al. ResNeSt: split-attention networks [C]//2022 IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022: 2735-2745.
[16] Robbins H, Monro S. A stochastic approximation method [J]. The Annals of Mathematical Statistics, 1951: 400-407.
[17] Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database [C]//2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009: 248-255.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献