首页|针式打印字体电离层垂测数据自动提取技术

针式打印字体电离层垂测数据自动提取技术

扫码查看
对于针式打印字体电离层垂测数据扫描图片的像素偏低、字体不连通、文本行粘连无法检测等问题,提出了一种基于CRNN 深度学习框架的数据自动提取技术,该技术主要包括图像预处理、文本检测、序列文本识别和识别结果版面处理 4 个模块。首先,对于 3 种不同行间距类型的针式打印字体垂测数据扫描图片采用图像模板匹配、降噪处理和倾斜矫正等方法进行图像预处理。然后对预处理后的图片利用投影法进行文本检测加以分割,其中投影分割检测算法中加入了垂直投影、水平投影和检测候选框修正功能,可有效处理粘连文本区域,提高检测精度。最后,考虑到图片数组长度不一,避免切分字符,所以将分割后的文本识别问题转化为序列学习问题,利用CRNN深度学习算法进行文本识别,再通过坐标融合算法,将识别结果保存成Excel标准化格式,从而实现数据自动提取保存。实验结果表明,本研究所提出的算法,文本检测召回率 97。7%,文本识别综合评价指标F 值就单个字符识别率 97。49%,整组字符识别率 94。78%,并与其他算法进行了比较,验证了其有效性,因此本文所提算法具有较高的实用性,能满足工程应用实际需求
Automatic Extraction Technology of Ionospheric Vertical Data with Pin Printer Font
Aiming at the problems such as low pixel,disconnected font and undetectable text line adhesion in the scanning images of vertical ionospheric data for pin printer font,an auto-matic data extraction technique based on CRNN deep learning framework is proposed,which includes four modules:image preprocessing,text detection,sequence text recognition and result layout processing.Firstly,image template matching,noise reduction and tilt correc-tion were used to preprocess the scanned images of three types of pin print vertical data with different line spacing types.Then,text detection and segmentation were performed on the preprocessed images by projection method.In the projection segmentation detection algo-rithm,vertical projection,horizontal projection and detection candidate frame correction functions were added.It can effectively deal with the cohesive text area and improve the de-tection accuracy.Finally,considering the different length of the image array,the segmenta-tion of characters is avoided,the segmented text recognition problem is transformed into a sequence learning problem,and the CRNN deep learning algorithm composed of CNN+ RNN+CTC is used for text recognition,and then the recognition results are saved into Ex-cel standardized format by coordinate fusion algorithm,so as to realize automatic data ex-traction and saving.The experimental results show that the algorithm proposed in this paper has a text detection recall rate of 97.7%,a text recognition comprehensive evaluation index F value of 97.49%for a single character recognition rate and 94.78%for a whole group of characters recognition rate,and is compared with other algorithms to verify its effectiveness.Therefore,the algorithm proposed in this paper has high practicability and can meet the ac-tual needs of engineering applications.

ionospherepin printer fontprojection segmentationtext detectionCRNNtext recognition

苏桂昌、张瑞坤、刘祥鹏

展开 >

青岛科技大学 数理学院,山东 青岛 266061

电离层 针式打印字体 投影分割 文本检测 CRNN 文本识别

国家自然科学基金项目国家自然科学基金项目

6210321512001308

2024

青岛科技大学学报(自然科学版)
青岛科技大学

青岛科技大学学报(自然科学版)

CSTPCD
影响因子:0.297
ISSN:1672-6987
年,卷(期):2024.45(1)
  • 4