Identifying Structural Function of Scientific Literature Abstracts Based on Deep Active Learning
[Objective]This paper explores different DeepAL methods for identifying the structural function of scientific literature abstracts and their labeling costs.[Methods]Firstly,we constructed a SciBERT-BiLSTM-CRF model for the abstracts(SBCA),which utilized the contextual sequence information between sentences.Then,we developed an uncertainty active learning strategy for single sentences and full text of the abstracts.Finally,we conducted experiments on the PubMed 20K dataset.[Results]The SBCA model showed the best recognition performance and increased the F1 value by 11.93%,compared to the SciBERT model without sequence information.Using the Least Confidence strategy based on the abstracts,our SBCA model achieved its optimal F1 value with 60%of the experimental data.Using the Least Confidence strategy based on sentences,the SBCA model achieved optimal F1 value with 65%of the experimental data.[Limitations]In the future,we need to examine different active learning strategies in more fields or multi-language datasets.[Conclusions]The new model based on deep active learning could identify the structural function of scientific literature with a lower annotation cost.
Deep LearningDocument Structural Function IdentificationMoveActive LearningKnowledge Organization