基于视觉关注模型与多尺度MSER的自然场景文本检测

doi:10.3969/j.issn.0255-8297.2020.03.015

摘要/Abstract

摘要： 自然场景中文本检测易受光照、复杂背景、多语言文字、字体及尺寸等因素影响，该文提出了一种基于Itti视觉关注模型与多尺度最大稳定极值区域（maximally stable extremalregion,MSER）结合的自然场景文本检测算法.首先利用改进的Itti视觉关注模型提取文本特征图，并采用不同结合策略得到各尺度文本显著图；然后结合多尺度的MSER区域得到3种文本候选区域.根据文字与生成文本框的几何规则合并文本候选区域得到文本行；最后利用随机森林分类器除去非文本区域得到最终文本区域.实验结果表明，该方法对于自然场景图像中的文本检测具有较高的精确度和一定的鲁棒性.

关键词: 自然场景, Itti视觉关注模型, 最大稳定极值区域, 文字区域检测

Abstract: Aiming at the low accuracy of current natural image detection algorithms, which is induced by the influence of illumination, complex background, multi-language and variety of font and size, a natural image text detection algorithm based on Itti visual salience model and multi-scale maximally stable extremal region (MSER) is proposed. First, we extract a text feature map from the improved Itti visual attention model, and obtain the text saliency maps of different scales by using different combination strategies. Then three kinds of text candidate regions can be figured out by combining with the multiscale MSER region, and text lines can be obtained by the text candidate regions according to these geometric rules of text and generated text boxes. Finally, the text area is obtained by using the random forest classifier to remove the non-text regions. Experimental results show that the text detection algorithm proposed in this paper has high detection accuracy and robustness under the influences of multi-language, text distortion and variety of size.

Key words: natural scene, Itti visual attention model, maximally stable extremal region (MSER), text area detection

中图分类号:

TP391.41

王大千, 崔荣一, 金璟璇. 基于视觉关注模型与多尺度MSER的自然场景文本检测[J]. 应用科学学报, 2020, 38(3): 496-506.

WANG Daqian, CUI Rongyi, JIN Jingxuan. Text Detection in Natural Scene Based on Visual Attention Model and Multi-scale MSER[J]. Journal of Applied Sciences, 2020, 38(3): 496-506.

参考文献

[1] 李翌昕,马尽文.文本检测算法的发展与挑战[J].信号处理, 2017, 33(4):558-571. Li Y X, Ma J W. The developments and challenges of text detection algorithms[J]. Journal of Signal Processing, 2017, 33(4):558-571.(in Chinese)
[2] 何思楠,郭永金,张利.多方向自然场景文本检测[J].计算机应用研究, 2018, 35(7):279-282. He S N, Guo Y J, Zhang L. Multi-directional natural scene text detection[J]. Application Research of Computer, 2018:35(7):279-282.(in Chinese)
[3] Neumann L, Matas J. Text localization in real word images using efficiently prune exhaustive search[C]//Proceedings of IEEE 11th International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 2011:687-691.
[4] Huang X M, Shen T, Wang R, et al. Text detection and recognition in natural scene images[C]//International Conferenceon Estimation, Detection and Information Fusion (ICEDIF), Harbin, China, 2015:44-49.
[5] Chen H Z, Sam S T, Georg S, et al. Robust text detection in natural images with edgeenhanced maximally stable extremal regions[C]//18th IEEE International Conference on Image Processing (ICIP), Brussels, Belgium, 2011:2609-2612.
[6] Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform[C]//IEEE Conference on Computer Vision and Pattern Recognition, California, USA, 2010:2963-2970.
[7] Moran C, Jonathan H, Wolfgang E, et al. Predicting human gaze using low-level saliency combined with face detection[C]//Advances in Neural Information Processing Systems, Vancouver, Canada, 2008:241-248.
[8] Moran C, Edword P F, Christof K. Using semantic content as cues for better scan path prediction[C]//Proceedings of Symposium on Eye Tracking Research & Applications, Savannah, USA, 2008:143-146.
[9] Moran C, Edword P F, Christof K. Faces and text attract gaze independent of the task:experimental data and computer model[J]. Journal of Vision, 2009, 9(12):10,1-15.
[10] Moran C, Jonathan H, Alex H, et al. Decoding what people see from where they look:predicting visual stimuli from scan paths[C]//International Workshop on Attention in Cognitive Systems, Santorini, Greece, 2008:15-26.
[11] Laurent I, Christof K, Ernst N. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998, 20(11):1254-1259.
[12] 方志明,崔荣一,金璟璇.交通场景静态显著性区域检测[J].激光与光电子学进展,2017, 54(5):286-292. Fang Z M, Cui R Y, Jin J X. Static saliency region detecion in traffic scenes[J]. Laser & Optoelectronics Progress, 2017, 54(5):286-292.(in Chinese)
[13] 张瑜慧,王海燕,郑步芹,等.一种结合边缘与区域信息的图像特征提取算法[J].太赫兹科学与电子信息学报,2013, 11(4):624-628. Zhang Y H, Wang H Y, Zheng B Q, et al. An image feature extraction algorithm based on edge and regional information[J]. Journal of Terahertz Science and Electronic Information Technology, 2013, 11(4):624-628.(in Chinese)
[14] Liu W, Dragomir A, Dumitru E, et al. SSD:single shot multibox detector[C]//European Conference on Computer Vision, Amsterdam, Netherlands, 2016:21-37.
[15] Zhang T, Wei L H, Tong H, et al. Detection text in natural image with connectionist text proposal network[C]//14th European Conference Computer Vision-ECCV 2016, Amsterdam, The Netherlands, 2016, Part VIII:56-72.
[16] Jaderberg M, Simonyan K, Vedaldi A, et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1):1-20.
[17] Zitnick C L, Dollar P. Edge boxes:locating object proposals from edges[C]//European Conference on Computer Vision, Zurich, Switzerland, 2014:391-405.
[18] Dollar P, Apple R, Belongie S, et al. Fast feature pyramids for object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(8):1532-1545.
[19] Treisman A M, Gelade G. A feature-integration theory of attention[J]. Cognitive Psychology, 1980:97-136.
[20] Koch C, Ullman S. Shifts in selective visual attention:towards the underlying neural circuitry[J]. Human Neurobiology, 1985, 4(4):219-227.
[21] 暴林超.复杂目标视觉注意模型研究[D].武汉:华中科技大学,2011.
[22] 刘行.复杂场景下的视觉目标跟踪研究[D].无锡:江南大学,2017.
[23] Matas J, Chum O, Urban M. Robust wide-baseline stereo from maximally stable extremal regions[J]. Image & Vision Computing, 2004, 22(10):761-767.
[24] 孙巧榆.复杂背景图像的文本信息提取研究[D].上海:华东师范大学,2012.
[25] 易尧华,申春辉,刘菊华,等.结合MSCRs与MSERs的自然场景文本检测[J].中国图象图形学报,2017, 22(2):154-160. Yi Y H, Shen C H, Liu J H, et al. Natural scence text detection method by integrating MSCRs into MSERs[J]. Journal of Image and Graphics, 2017:22(2):154-160.(in Chinese)
[26] 张鹏,崔荣一.基于视觉显著性与边缘密集度的文本区域定位[J].吉林大学学报(信息科学版), 2017, 35(3):319-323. Zhang P, Cui R Y. Text localization algorithm based on visual saliency and edge density[J]. Journal of Jilin University (Information Science Edition), 2017, 35(3):319-323.(in Chinese)
[27] 田清越,高志荣,熊承义,等.联合边缘增强的MSER自然场景文本检测[J].小型微机计算机系统,2017, 38(11):2604-2609. Tian Q Y, Gao Z R, Xiong C Y, et al. Text detection in natural scene image with joint edge enhanced MSER[J]. Journal of Chinese Computer Systems, 2017, 38(11):2604-2609.(in Chinese)
[28] 付程琳.基于MSER的自然场景文本定位算法研究[D].西安:西安科技大学,2017.