Text Detection in Natural Scene Based on Visual Attention Model and Multi-scale MSER

WANG Daqian; CUI Rongyi; JIN Jingxuan

doi:10.3969/j.issn.0255-8297.2020.03.015

Journal of Applied Sciences >

2020 , Vol. 38 >Issue 3: 496 - 506

DOI: https://doi.org/10.3969/j.issn.0255-8297.2020.03.015

Computer Science and Applications

Text Detection in Natural Scene Based on Visual Attention Model and Multi-scale MSER

WANG Daqian ,
CUI Rongyi ,
JIN Jingxuan

Expand

College of Engineering, Yanbian University, Yanji 133002, Jilin province, China

Received date: 2018-11-14

Online published: 2020-06-11

Fold

Abstract

Aiming at the low accuracy of current natural image detection algorithms, which is induced by the influence of illumination, complex background, multi-language and variety of font and size, a natural image text detection algorithm based on Itti visual salience model and multi-scale maximally stable extremal region (MSER) is proposed. First, we extract a text feature map from the improved Itti visual attention model, and obtain the text saliency maps of different scales by using different combination strategies. Then three kinds of text candidate regions can be figured out by combining with the multiscale MSER region, and text lines can be obtained by the text candidate regions according to these geometric rules of text and generated text boxes. Finally, the text area is obtained by using the random forest classifier to remove the non-text regions. Experimental results show that the text detection algorithm proposed in this paper has high detection accuracy and robustness under the influences of multi-language, text distortion and variety of size.

Key words： natural scene; Itti visual attention model; maximally stable extremal region (MSER); text area detection

Cite this article

WANG Daqian , CUI Rongyi , JIN Jingxuan . Text Detection in Natural Scene Based on Visual Attention Model and Multi-scale MSER[J]. Journal of Applied Sciences, 2020 , 38(3) : 496 -506 . DOI: 10.3969/j.issn.0255-8297.2020.03.015

References

[1] 李翌昕,马尽文.文本检测算法的发展与挑战[J].信号处理, 2017, 33(4):558-571. Li Y X, Ma J W. The developments and challenges of text detection algorithms[J]. Journal of Signal Processing, 2017, 33(4):558-571.(in Chinese)
[2] 何思楠,郭永金,张利.多方向自然场景文本检测[J].计算机应用研究, 2018, 35(7):279-282. He S N, Guo Y J, Zhang L. Multi-directional natural scene text detection[J]. Application Research of Computer, 2018:35(7):279-282.(in Chinese)
[3] Neumann L, Matas J. Text localization in real word images using efficiently prune exhaustive search[C]//Proceedings of IEEE 11th International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 2011:687-691.
[4] Huang X M, Shen T, Wang R, et al. Text detection and recognition in natural scene images[C]//International Conferenceon Estimation, Detection and Information Fusion (ICEDIF), Harbin, China, 2015:44-49.
[5] Chen H Z, Sam S T, Georg S, et al. Robust text detection in natural images with edgeenhanced maximally stable extremal regions[C]//18th IEEE International Conference on Image Processing (ICIP), Brussels, Belgium, 2011:2609-2612.
[6] Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform[C]//IEEE Conference on Computer Vision and Pattern Recognition, California, USA, 2010:2963-2970.
[7] Moran C, Jonathan H, Wolfgang E, et al. Predicting human gaze using low-level saliency combined with face detection[C]//Advances in Neural Information Processing Systems, Vancouver, Canada, 2008:241-248.
[8] Moran C, Edword P F, Christof K. Using semantic content as cues for better scan path prediction[C]//Proceedings of Symposium on Eye Tracking Research & Applications, Savannah, USA, 2008:143-146.
[9] Moran C, Edword P F, Christof K. Faces and text attract gaze independent of the task:experimental data and computer model[J]. Journal of Vision, 2009, 9(12):10,1-15.
[10] Moran C, Jonathan H, Alex H, et al. Decoding what people see from where they look:predicting visual stimuli from scan paths[C]//International Workshop on Attention in Cognitive Systems, Santorini, Greece, 2008:15-26.
[11] Laurent I, Christof K, Ernst N. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998, 20(11):1254-1259.
[12] 方志明,崔荣一,金璟璇.交通场景静态显著性区域检测[J].激光与光电子学进展,2017, 54(5):286-292. Fang Z M, Cui R Y, Jin J X. Static saliency region detecion in traffic scenes[J]. Laser & Optoelectronics Progress, 2017, 54(5):286-292.(in Chinese)
[13] 张瑜慧,王海燕,郑步芹,等.一种结合边缘与区域信息的图像特征提取算法[J].太赫兹科学与电子信息学报,2013, 11(4):624-628. Zhang Y H, Wang H Y, Zheng B Q, et al. An image feature extraction algorithm based on edge and regional information[J]. Journal of Terahertz Science and Electronic Information Technology, 2013, 11(4):624-628.(in Chinese)
[14] Liu W, Dragomir A, Dumitru E, et al. SSD:single shot multibox detector[C]//European Conference on Computer Vision, Amsterdam, Netherlands, 2016:21-37.
[15] Zhang T, Wei L H, Tong H, et al. Detection text in natural image with connectionist text proposal network[C]//14th European Conference Computer Vision-ECCV 2016, Amsterdam, The Netherlands, 2016, Part VIII:56-72.
[16] Jaderberg M, Simonyan K, Vedaldi A, et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1):1-20.
[17] Zitnick C L, Dollar P. Edge boxes:locating object proposals from edges[C]//European Conference on Computer Vision, Zurich, Switzerland, 2014:391-405.
[18] Dollar P, Apple R, Belongie S, et al. Fast feature pyramids for object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(8):1532-1545.
[19] Treisman A M, Gelade G. A feature-integration theory of attention[J]. Cognitive Psychology, 1980:97-136.
[20] Koch C, Ullman S. Shifts in selective visual attention:towards the underlying neural circuitry[J]. Human Neurobiology, 1985, 4(4):219-227.
[21] 暴林超.复杂目标视觉注意模型研究[D].武汉:华中科技大学,2011.
[22] 刘行.复杂场景下的视觉目标跟踪研究[D].无锡:江南大学,2017.
[23] Matas J, Chum O, Urban M. Robust wide-baseline stereo from maximally stable extremal regions[J]. Image & Vision Computing, 2004, 22(10):761-767.
[24] 孙巧榆.复杂背景图像的文本信息提取研究[D].上海:华东师范大学,2012.
[25] 易尧华,申春辉,刘菊华,等.结合MSCRs与MSERs的自然场景文本检测[J].中国图象图形学报,2017, 22(2):154-160. Yi Y H, Shen C H, Liu J H, et al. Natural scence text detection method by integrating MSCRs into MSERs[J]. Journal of Image and Graphics, 2017:22(2):154-160.(in Chinese)
[26] 张鹏,崔荣一.基于视觉显著性与边缘密集度的文本区域定位[J].吉林大学学报(信息科学版), 2017, 35(3):319-323. Zhang P, Cui R Y. Text localization algorithm based on visual saliency and edge density[J]. Journal of Jilin University (Information Science Edition), 2017, 35(3):319-323.(in Chinese)
[27] 田清越,高志荣,熊承义,等.联合边缘增强的MSER自然场景文本检测[J].小型微机计算机系统,2017, 38(11):2604-2609. Tian Q Y, Gao Z R, Xiong C Y, et al. Text detection in natural scene image with joint edge enhanced MSER[J]. Journal of Chinese Computer Systems, 2017, 38(11):2604-2609.(in Chinese)
[28] 付程琳.基于MSER的自然场景文本定位算法研究[D].西安:西安科技大学,2017.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References