A natural language processing data collection method for weakly regulated professional petroleum documents
The existing data collection methods lead to low accuracy and long collection time due to misreading in weak specification and strong professional vocabulary documents,and the problem of low data collection accuracy and long collection time is solved by establishing a fast and accurate data collection model suitable for the characteristics of petroleum engineering documents and providing a data basis for big data computing.Firstly,a hierarchical structure of entries was established to identify the differences in petroleum engineering documentation.Then,a dictionary of technical terms is established so that the computer can recognize the technical terms of petroleum engineering in the document;Finally,based on the natural language model,the SPBERT data collection model was constructed through a large amount of data training.Realize the import of workover-related Word documents,and the model can automatically output the data and corresponding labels in the document.The accuracy of the model was verified by comparing the model with two existing regular methods,two general BERT models and one GPT model on the field workover data of Changqing Oilfield,and the time taken by the model to collect data was counted.The average accuracy of the five data collection models was 40.06%,and the accuracy of the SPBERT model in the workover data collection was 82.3%,which was more than 1 times higher than the average accuracy.The SPBERT model collected 402 milliseconds for each set of correct data collected,which was 27.44%less than the average collection time of 554 milliseconds for the rest of the models.The SPBERT model can collect supplementary data with high accuracy and short model collection time,which can further enhance the professionalism of natural language models and promote the construction of digital intelligence in oilfields.
Data assetsData collectionNatural language processingWorkoverSmart oilfieldNew quality productivity