应用科学学报 ›› 2020, Vol. 38 ›› Issue (3): 496-506.doi: 10.3969/j.issn.0255-8297.2020.03.015

• 计算机科学与应用 • 上一篇    

基于视觉关注模型与多尺度MSER的自然场景文本检测

王大千, 崔荣一, 金璟璇   

  1. 延边大学 工学院, 吉林 延吉 133002
  • 收稿日期:2018-11-14 出版日期:2020-05-31 发布日期:2020-06-11
  • 通信作者: 崔荣一,教授,研究方向为机器学习、自然语言处理等.E-mail:cuirongyi@ybu.edu.cn E-mail:cuirongyi@ybu.edu.cn
  • 基金资助:
    国家语委“十二·五”科研规划项目基金(No.YB125-178);吉林省高教科研项目基金(No.JGJX2019D20)资助

Text Detection in Natural Scene Based on Visual Attention Model and Multi-scale MSER

WANG Daqian, CUI Rongyi, JIN Jingxuan   

  1. College of Engineering, Yanbian University, Yanji 133002, Jilin province, China
  • Received:2018-11-14 Online:2020-05-31 Published:2020-06-11

摘要: 自然场景中文本检测易受光照、复杂背景、多语言文字、字体及尺寸等因素影响,该文提出了一种基于Itti视觉关注模型与多尺度最大稳定极值区域(maximally stable extremalregion,MSER)结合的自然场景文本检测算法.首先利用改进的Itti视觉关注模型提取文本特征图,并采用不同结合策略得到各尺度文本显著图;然后结合多尺度的MSER区域得到3种文本候选区域.根据文字与生成文本框的几何规则合并文本候选区域得到文本行;最后利用随机森林分类器除去非文本区域得到最终文本区域.实验结果表明,该方法对于自然场景图像中的文本检测具有较高的精确度和一定的鲁棒性.

关键词: 自然场景, Itti视觉关注模型, 最大稳定极值区域, 文字区域检测

Abstract: Aiming at the low accuracy of current natural image detection algorithms, which is induced by the influence of illumination, complex background, multi-language and variety of font and size, a natural image text detection algorithm based on Itti visual salience model and multi-scale maximally stable extremal region (MSER) is proposed. First, we extract a text feature map from the improved Itti visual attention model, and obtain the text saliency maps of different scales by using different combination strategies. Then three kinds of text candidate regions can be figured out by combining with the multiscale MSER region, and text lines can be obtained by the text candidate regions according to these geometric rules of text and generated text boxes. Finally, the text area is obtained by using the random forest classifier to remove the non-text regions. Experimental results show that the text detection algorithm proposed in this paper has high detection accuracy and robustness under the influences of multi-language, text distortion and variety of size.

Key words: natural scene, Itti visual attention model, maximally stable extremal region (MSER), text area detection

中图分类号: