Automatic Extraction Technology of Ionospheric Vertical Data with Pin Printer Font
Aiming at the problems such as low pixel,disconnected font and undetectable text line adhesion in the scanning images of vertical ionospheric data for pin printer font,an auto-matic data extraction technique based on CRNN deep learning framework is proposed,which includes four modules:image preprocessing,text detection,sequence text recognition and result layout processing.Firstly,image template matching,noise reduction and tilt correc-tion were used to preprocess the scanned images of three types of pin print vertical data with different line spacing types.Then,text detection and segmentation were performed on the preprocessed images by projection method.In the projection segmentation detection algo-rithm,vertical projection,horizontal projection and detection candidate frame correction functions were added.It can effectively deal with the cohesive text area and improve the de-tection accuracy.Finally,considering the different length of the image array,the segmenta-tion of characters is avoided,the segmented text recognition problem is transformed into a sequence learning problem,and the CRNN deep learning algorithm composed of CNN+ RNN+CTC is used for text recognition,and then the recognition results are saved into Ex-cel standardized format by coordinate fusion algorithm,so as to realize automatic data ex-traction and saving.The experimental results show that the algorithm proposed in this paper has a text detection recall rate of 97.7%,a text recognition comprehensive evaluation index F value of 97.49%for a single character recognition rate and 94.78%for a whole group of characters recognition rate,and is compared with other algorithms to verify its effectiveness.Therefore,the algorithm proposed in this paper has high practicability and can meet the ac-tual needs of engineering applications.