应用掩码区域卷积神经网络的文本检测模型

doi:10.3969/j.issn.0255-8297.2023.03.013

应用科学学报 ›› 2023, Vol. 41 ›› Issue (3): 527-540.doi: 10.3969/j.issn.0255-8297.2023.03.013

• 计算机科学与应用 • 上一篇

应用掩码区域卷积神经网络的文本检测模型

赵小薇^1,2, 季明辉¹, 徐秀娟^1,2, 沈家乐¹

1. 大连理工大学软件学院, 辽宁大连 116620;
2. 大连理工大学辽宁省泛在网络与服务软件重点实验室, 辽宁大连 116620

收稿日期:2022-06-30 出版日期:2023-05-30 发布日期:2023-06-16
通信作者: 徐秀娟,副教授,研究方向为自然语言处理、城市交通数据处理。E-mail:xjxu@dlut.edu.cn E-mail:xjxu@dlut.edu.cn
基金资助:
国家自然科学基金(No. 61672128)资助

Text Detection Model Based on Mask Region Convolution Neural Network

ZHAO Xiaowei^1,2, JI Minghui¹, XU Xiujuan^1,2, SHEN Jiale¹

1. School of Software Technology, Dalian University of Technology, Dalian 116620, Liaoning, China;
2. Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian 116620, Liaoning, China

Received:2022-06-30 Online:2023-05-30 Published:2023-06-16

摘要/Abstract

摘要： 提出一种基于掩码区域卷积神经网络的文本检测模型。首先从扩大模型感受野并尽可能保持模型效率的角度出发，针对残差神经网络中的瓶颈结构进行优化，构建基于结构优化的残差神经网络（residual network based on structural optimization, ResNetSO）；然后去除冗余特征以提高融合后特征质量，并将空间注意力机制应用于特征金字塔网络，构建了基于下层特征指导的特征金字塔网络（feature pyramid network based on lower feature guidance,FPNetLFG）。在两个公开数据集上的实验结果表明：包含 ResNetSO 和 FPNetLFG 两个模块的模型应用在级联区域卷积神经网络、递归特征金字塔和可切换空洞卷积的目标检测模型中，分别可以带来 0.8% 和 0.3% 左右的 F1 值提升，从而说明了该方法的有效性和普遍适用性。

关键词: 文本检测, 掩码区域卷积神经网络, 主干网络, 结构优化, 特征金字塔网络

Abstract: This paper proposes a text detection model based on mask region convolution neural network (Mask R-CNN). Firstly, the model optimizes the bottleneck structure of residual networks from the perspective of expanding the receptive field of the model and maintaining the efficiency of the model as much as possible, and proposes a residual network based on structural optimization (ResNetSO). Then for removing redundant features and improving the quality of fused features, the model generates a feature pyramid network based on lower feature guidance (FPNetLFG) by applying spatial attention mechanism to feature pyramid network. Finally, experimental results on two data sets show that as applying the proposed model, which consists of ResNetSO and FPNetLFG modules, in cascade region convolution neural network (Cascade R-CNN) and detecting objects with recursive feature pyramid and switchable atrous convolution (DetectoRS), F1 value can be improved by 0.8% and 0.3%, respectively, which verifies the effectiveness and universal applicability of this method.

Key words: text detection, mask region convolution neural network (Mask R-CNN), backbone network, structural optimization, feature pyramid network

中图分类号:

TP391

赵小薇, 季明辉, 徐秀娟, 沈家乐. 应用掩码区域卷积神经网络的文本检测模型[J]. 应用科学学报, 2023, 41(3): 527-540.

ZHAO Xiaowei, JI Minghui, XU Xiujuan, SHEN Jiale. Text Detection Model Based on Mask Region Convolution Neural Network[J]. Journal of Applied Sciences, 2023, 41(3): 527-540.

参考文献

[1] He K, Gkioxari G, Dollár P, et al. Mask R-CNN [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397.
[2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 1492-1500.
[3] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7132-7141.
[4] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 2117-2125.
[5] Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 8759-8768.
[6] Tan M, Pang R, Le Q V. EfficientDet: scalable and efficient object detection [C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020: 10781-10790.
[7] Picron C, Tuytelaars T. Trident pyramid networks: the importance of processing at the feature pyramid level for better object detection [J/OL] (2021-10-08) [2022-5-30]. https://arXiv:2110.04004.
[8] Gao S H, Cheng M M, Zhao K, et al. Res2Net: a new multi-scale backbone architecture [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(2): 652-662.
[9] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 770-778.
[10] Cai Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 6154-6162.
[11] Chng C K, Liu Y, Sun Y, et al. ICDAR2019 robust reading challenge on arbitrary-shaped textRRC-art [C]//Proceedings of the 15th IEEE International Conference on Document Analysis and Recognition, Sydney, Australia, 2019: 1571-1576.
[12] Ch’ng C K, Chan C S. Total-text: a comprehensive dataset for scene text detection and recognition [C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Japan, 2017, 1: 935-942.
[13] Liu Y, Jin L, Zhang S, et al. Curved scene text detection via transverse and longitudinal sequence connection [J]. Pattern Recognition, 2019, 90: 337-345.
[14] Chen K, Wang J, Pang J, et al. MMDetection: open mmlab detection toolbox and benchmark [J/OL]. (2019-06-17) [2022-05-30]. http://arXiv:1906.07155.
[15] Zhang H, Wu C, Zhang Z, et al. ResNeSt: split-attention networks [C]//2022 IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022: 2735-2745.
[16] Robbins H, Monro S. A stochastic approximation method [J]. The Annals of Mathematical Statistics, 1951: 400-407.
[17] Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database [C]//2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009: 248-255.

应用掩码区域卷积神经网络的文本检测模型

Text Detection Model Based on Mask Region Convolution Neural Network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

编辑推荐

Metrics

本文评价

[1]	刘传清;胡修林;张蕴玉. FastFHMA/MFSK系统中基于分集接收MMSE的多用户检测算法[J]. 应用科学学报, 2007, 25(2): 129-129 .
[2]	徐华;徐澄圻. 对基于EXIT图的LDPC码优化算法的改进[J]. 应用科学学报, 2007, 25(2): 134-134 .
[3]	王江;施明恒;陈志. 直接甲醇燃料电池堆新型冷却系统的研究 [J]. 应用科学学报, 2006, 24(1): 103-0103 .
[4]	薛彩军, 聂宏, 邱清盈. 分布式环境下结构动静态协同优化设计[J]. 应用科学学报, 2005, 23(3): 287-291.