Identification of Problem and Method in Scientific Papers Based on Formulaic Expression Desensitization and Enhanced Boundary Recognition
Problems and methods are crucial components of scientific papers and play a significant role in the organiza-tion,management,retrieval,and evaluation of scientific papers.To alleviate the formulaic expression dependency and word boundary recognition errors in problem and method recognition methods,we propose a model combined with formu-laic expression desensitization and enhanced boundary recognition.Specifically,formulaic expression desensitization is achieved through data augmentation methods,whereas boundary enhancement utilizes pointer networks and sequence la-beling models.With open access to scientific papers,researchers are utilizing full-text papers for entity recognition tasks.To demonstrate the importance of using full-text papers,this paper manually constructs an abstract and full-text annotated dataset in the field of natural language processing.Numerical and content-based metrics are designed to compare the prob-lem,method,and their relationship extracted from two datasets.The results of ten-fold cross-validation experiments indi-cate that the proposed model outperforms baseline models such as SciBERT-BiLSTM-CRF significantly,with a macro-av-erage F1 score improvement of 3.69 percentage points.When comparing entity recognition and relationship extraction re-sults between abstracts and full texts,this paper shows that problem and method entities in abstracts have a broader seman-tic representation,whereas full texts contain more detailed entities and relationships that describe model design and train-ing procedures.
knowledge entity recognitionidentification of problem and methodpointer networksdata augmentation