Research on intelligent collection method of archives based on OCR technology
In order to realize the intelligent management of file information,a lightweight end-to-end intelligent file collection system is proposed.Firstly,a lightweight object detection neural network PP-PicoDet is used as a layout detector to analyze the layout of archival materials.Then,SLANet deep learning neural network is used for structural recognition of the tables.Finally,the open source Paddle OCR engine is used for text recognition.The accuracy of the system for table recognition is 75.8%,the accuracy of printed text recognition is 98.3%,and the total reasoning time is less than 0.85s.This system brings forward an effective solution to realize the intelligent collection of file data from end to end and improve the efficiency of file data sorting.
intelligent collection of archivesdeep learningoptical character recongnitionChinese formhandwriting recognition