应用科学学报 ›› 2023, Vol. 41 ›› Issue (6): 1058-1067.doi: 10.3969/j.issn.0255-8297.2023.06.012

• 计算机科学与应用 • 上一篇    下一篇

基于OCR和图像检测的盖章文书图像自动审核方法

曹菁1, 陈康1, 齐宁1, 夏鹏程1, 邱渝2   

  1. 1. 江苏省联合征信有限公司, 江苏 南京 210000;
    2. 南京大学 软件学院, 江苏 南京 210093
  • 收稿日期:2021-12-01 出版日期:2023-11-30 发布日期:2023-11-30
  • 通信作者: 曹菁,工程师,研究方向为金融科技。E-mail:caojing103@126.com E-mail:caojing103@126.com

Auto-Checking Stamped Document Image Based on OCR and Image Detection

CAO Jing1, CHEN Kang1, QI Ning1, XIA Pengcheng1, QIU Yu2   

  1. 1. Jiangsu United Credit Co., Ltd., Nanjing 210000, Jiangsu, China;
    2. Software Institute, Nanjing University, Nanjing 210093, Jiangsu, China
  • Received:2021-12-01 Online:2023-11-30 Published:2023-11-30

摘要: 本文基于OCR和图像检测技术设计并实现了一个解决盖章文书图像审核耗时、低效、准确率无保障问题的自动审核方法。具体包括三个部分:文字识别、印章识别和表格内容审核。其中文字识别部分包括带有角度的文本检测算法SegLink以及卷积递归神经网络(convolutional recurrent neural network,CRNN);印章识别部分包括印章识别与提取算法YOLOv3和印章内容识别方法——极坐标变换法;表格内容审核部分根据预设的规则对表格内容进行完备性和正确性检测。实验结果表明,该方法对此类盖章文书图像具有较高的审核准确率。

关键词: 自动审核, 文字识别, 印章识别, 卷积递归神经网络

Abstract: In this paper, we design and implement an auto-checking method based on OCR and image detection to replace the time-consuming and error-prone manual work. The method consists of three parts: text recognition, seal recognition, and content checking. For text recognition, we utilize the SegLink algorithm for angled text detection and the CRNN algorithm for variable length end-to-end text recognition. For seal recognition, we employ the YOLOv3 algorithm for seal recognition and extraction, along with the polar coordinate transformation method for seal content recognition. The content checking is based on the preset rules to check the completeness and correctness of the content extracted from the form. Experimental result shows that the proposed method achieves high accuracy in checking stamped document image with seals.

Key words: automated examining, text recognition, seal recognition, convolutional recurrent neural network (CRNN)

中图分类号: