In order to realize the intelligent management of file information,a lightweight end-to-end intelligent file collection system is proposed.Firstly,a lightweight object detection neural network PP-PicoDet is used as a layout detector to analyze the layout of archival materials.Then,SLANet deep learning neural network is used for structural recognition of the tables.Finally,the open source Paddle OCR engine is used for text recognition.The accuracy of the system for table recognition is 75.8%,the accuracy of printed text recognition is 98.3%,and the total reasoning time is less than 0.85s.This system brings forward an effective solution to realize the intelligent collection of file data from end to end and improve the efficiency of file data sorting.
关键词
档案智能化收集/深度学习/光学字符识别/中文表格/手写体识别
Key words
intelligent collection of archives/deep learning/optical character recongnition/Chinese form/handwriting recognition