Template-based Handwritten Form Information Extraction Method Using Deep Learning
Handwritten paper forms serve as crucial data carriers for information exchange among various departments in manufacturing enterprises,and the extraction of key information is of great significance for production,management,and decision-making.However,current solutions for handwritten form information extraction face challenges in accurately and rapidly extracting key information from complex text layouts.To address this issue,a two-stage template-based handwritten form information extraction method is proposed,requiring only one image to complete template construction,focusing on user-relevant information,and avoiding potential logical errors in traditional relationship extraction tasks in complex tables.Initially,for a specific type of table image,the desired recognition areas are directly annotated on the image,and corresponding key values are assigned to these areas.Subsequently,a high-resolution network is em-ployed to improve the detection precision of small text,and a strategy of uniform segmentation with shuffling at multiple resolutions is proposed to achieve good performance in both performance and parameters for the detection model.Simultaneously,the introduction of temporal convolutional networks and self-attention mechanisms enables the recognition model to better handle the blurriness,unclearness,and stroke omissions caused by handwriting speed and writing tools.After recognition,the system binds the recognition results with preset key values to form structured output.Experimental results demonstrate that compared to the typical ResNet50 model with almost equal parameters,the precision of small text detection is improved by 15.8 percentage points.In text recognition tasks,the model achieves a character precision of 99.30%on the CASIA-HWDB2.0-2.2 dataset.Even in cases where the text box does not completely cover the entire text line,the character precision only drops by 0.55 percentage points,indicating that the text recognition model exhibits good ro-bustness.
information extractionhandwritten formtemplate-basedhandwritten text recognitiontext line detection