应用科学学报 ›› 2023, Vol. 41 ›› Issue (3): 527-540.doi: 10.3969/j.issn.0255-8297.2023.03.013

• 计算机科学与应用 • 上一篇    

应用掩码区域卷积神经网络的文本检测模型

赵小薇1,2, 季明辉1, 徐秀娟1,2, 沈家乐1   

  1. 1. 大连理工大学 软件学院, 辽宁 大连 116620;
    2. 大连理工大学 辽宁省泛在网络与服务软件重点实验室, 辽宁 大连 116620
  • 收稿日期:2022-06-30 出版日期:2023-05-30 发布日期:2023-06-16
  • 通信作者: 徐秀娟,副教授,研究方向为自然语言处理、城市交通数据处理。E-mail:xjxu@dlut.edu.cn E-mail:xjxu@dlut.edu.cn
  • 基金资助:
    国家自然科学基金(No. 61672128)资助

Text Detection Model Based on Mask Region Convolution Neural Network

ZHAO Xiaowei1,2, JI Minghui1, XU Xiujuan1,2, SHEN Jiale1   

  1. 1. School of Software Technology, Dalian University of Technology, Dalian 116620, Liaoning, China;
    2. Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian 116620, Liaoning, China
  • Received:2022-06-30 Online:2023-05-30 Published:2023-06-16

摘要: 提出一种基于掩码区域卷积神经网络的文本检测模型。首先从扩大模型感受野并尽可能保持模型效率的角度出发,针对残差神经网络中的瓶颈结构进行优化,构建基于结构优化的残差神经网络(residual network based on structural optimization, ResNetSO);然后去除冗余特征以提高融合后特征质量,并将空间注意力机制应用于特征金字塔网络,构建了基于下层特征指导的特征金字塔网络(feature pyramid network based on lower feature guidance,FPNetLFG)。在两个公开数据集上的实验结果表明: 包含 ResNetSO 和 FPNetLFG 两个模块的模型应用在级联区域卷积神经网络、递归特征金字塔和可切换空洞卷积的目标检测模型中,分别可以带来 0.8% 和 0.3% 左右的 F1 值提升,从而说明了该方法的有效性和普遍适用性。

关键词: 文本检测, 掩码区域卷积神经网络, 主干网络, 结构优化, 特征金字塔网络

Abstract: This paper proposes a text detection model based on mask region convolution neural network (Mask R-CNN). Firstly, the model optimizes the bottleneck structure of residual networks from the perspective of expanding the receptive field of the model and maintaining the efficiency of the model as much as possible, and proposes a residual network based on structural optimization (ResNetSO). Then for removing redundant features and improving the quality of fused features, the model generates a feature pyramid network based on lower feature guidance (FPNetLFG) by applying spatial attention mechanism to feature pyramid network. Finally, experimental results on two data sets show that as applying the proposed model, which consists of ResNetSO and FPNetLFG modules, in cascade region convolution neural network (Cascade R-CNN) and detecting objects with recursive feature pyramid and switchable atrous convolution (DetectoRS), F1 value can be improved by 0.8% and 0.3%, respectively, which verifies the effectiveness and universal applicability of this method.

Key words: text detection, mask region convolution neural network (Mask R-CNN), backbone network, structural optimization, feature pyramid network

中图分类号: