应用科学学报 ›› 2011, Vol. 29 ›› Issue (1): 56-60.doi: 10.3969/j.issn.0255-8297.2011.01.010

• 信号与信息处理 • 上一篇    下一篇

基本图像特征用于文本图像文种识别

郭龙, 平西建, 周林, 童莉   

  1. 解放军信息工程大学信息工程学院,郑州450002
  • 收稿日期:2010-09-14 修回日期:2010-10-21 出版日期:2011-01-26 发布日期:2011-01-25
  • 通信作者: 平西建,教授,博导,研究方向:图像处理、模式识别、计算机视觉、信息隐藏,E-mail: pingxj@126.com
  • 作者简介:平西建,教授,博导,研究方向:图像处理、模式识别、计算机视觉、信息隐藏,E-mail: pingxj@126.com
  • 基金资助:

    国家自然科学基金(No.60970172)资助

Identification of Scripts in Document Images Using Basic Image Features

GUO Long, PING Xi-jian, ZHOU Lin, TONG Li   

  1. Institute of Information Engineering, PLA Information Engineering University, Zhengzhou 450002, China
  • Received:2010-09-14 Revised:2010-10-21 Online:2011-01-26 Published:2011-01-25

摘要:

为了解决现有方法在运算速度和识别效果上的矛盾,该文将基本图像特征用于文本图像文种识别. 按照纹理分析结构方法的观点,将纹理基元划分为7 种类型,提取描述文本图像纹理基元构成以及其空间位置关系的特征,采用支持向量机对所提特征进行训练和分类,实现文字种类识别. 实验选用有质量退化的中、英、俄、日、韩、阿拉伯等10 种不同语言文字的文本图像,结果表明该算法运算速度快,有较高的识别准确率,并对图像质量退化有较好的鲁棒性.

关键词: 文本图像;文种识别;基本图像特征;支持向量机

Abstract:

In this paper, a novel script identification method based on basic image features (BIF) is proposed to resolve the conflict between the speed of feature extraction and accuracy of identification. According to the concept of structural approach to texture analysis, texture basic units are divided into seven types. Support vector machine (SVM) is used to train and classify the features, which describe the structure and distribution of texture basic units, to identify scripts in different languages. Experiments have been performed upon degraded document images including ten languages including Chinese, Russian, English, Japanese, Korean, Arabic, etc. Experimental results confirm that the proposed method can identify scripts accurately and efficiently. It is robust against image degradation.

Key words: document image, script identification, basic image feature (BIF), support vector machine

中图分类号: