Algorithm Study on Table and Text Information Extraction of Investigation Report
At present,investigation report is an important basis for engineering design,in which a large number of table and text information haven't been effectively identified and utilized.In order to further break through the data barriers of professional software development,it's urgent to effectively identify and extract the information of investigation report.This paper proposed an algorithm and a complete set of solutions in this regard.Based on the file reading and writing library,the Word tables were traversed,and the row and column spans of each cell were calculated,which realized the accurate recognition of the Word table to Excel.Based on document automation technology,the Word table ranges were recorded and the table titles were obtained by reverse searching.Based on the stack data structure,the Word paragraphs were traversed for outline matching and range calculation,and the Word text information recognition was realized.The data was presented on the software interface through the simulation of copy-and-paste operations in the background.The multi-threading mechanism was introduced to prevent the information extraction operation from blocking the main thread,and the parallel analysis mechanism was introduced to boost the efficiency of text analysis,thereby improving the comprehensive user experience of the software.Finally,the applicability and accuracy of this algorithm was verified by a real engineering investigation report.
algorithmtable information extractiontext information extractionmulti-thread