Extraction of Fine-grained Research Methods in the Field of Information Science
[Purpose/significance]Research methods in information science are one of the critical research directions in this field.Constructing a fine-grained research method corpus and extracting research method entities can help schol-ars quickly understand the research methods in this field,explore the evolution of methods and their future develop-ment trends,and lay the foundation for the service and application of the research method corpus in the subsequent digital wave.[Method/process]Firstly,based on academic articles published in the Journal of the China Society for Sci-entific and Technical Information from 2000 to 2023,this study randomly selected 50 articles and manually annotated the research methodology entities within them,using these as the training corpus for entity extraction.Secondly,two models,BERT-base-chinese and Chinese-BERT-wwm-ext,were selected for entity extraction,and the model with superior performance was chosen as the final entity extraction model for this study.[Result/conclusion]This paper con-structs a fine-grained research method annotation corpus of informatics that includes six types of entities:theoretical entity,method entity,dataset entity,indicator entity,tool entity,and other entities.In the task of training an entity ex-traction model based on manually annotated corpora,the Chinese-BERT-wwm-ext model performed better,with an accuracy rate,recall rate,and F1 score of 0.808 2,0.846 7,and 0.827 0,respectively.Furthermore,this paper con-ducts an analysis of the research method entities and their categories,discovering that research methodologies in infor-mation science are becoming increasingly diverse,with emerging technologies coexisting alongside traditional meth-ods,each showcasing their unique strengths.
information sciencecorpus of research methodresearch method entitiesresearch method recognitionfine-grained research method