首页|情报学细粒度研究方法抽取研究

情报学细粒度研究方法抽取研究

扫码查看
[目的/意义]情报学研究方法是该领域的重要研究方向之一,构建细粒度研究方法语料库并进行研究方法实体抽取,有助于学者快速了解该领域的研究方法,探索方法演变及其未来发展趋势,为后续数字化浪潮下实现研究方法语料库的服务与应用奠定基础。[方法/过程]首先,文章以《情报学报》2000-2023 年发表的学术论文为数据来源,从中随机抽取 50 篇并对其中的研究方法实体进行人工标注,将其作为实体抽取训练语料;其次,对BERT-base-chinese和Chinese-BERT-wwm-ext 2 种模型进行训练并选择性能较优的模型进行研究方法实体抽取;最后,根据较优实体抽取模型,从未标注语料中抽取细粒度研究方法实体。[结果/结论]文章构建了一个包含理论实体、方法实体、数据集实体、指标实体、工具实体和其他实体 6 类情报学细粒度研究方法标注语料库。在基于人工标注语料对实体抽取模型进行训练的任务中,Chinese-BERT-wwm-ext模型表现更佳,准确率、召回率和F1 值分别为 0。808 2、0。846 7 和 0。827 0。此外,文章对研究方法实体及其类别进行分析,发现情报学研究方法日益多样化,新兴技术与传统方法并存、各有优势。
Extraction of Fine-grained Research Methods in the Field of Information Science
[Purpose/significance]Research methods in information science are one of the critical research directions in this field.Constructing a fine-grained research method corpus and extracting research method entities can help schol-ars quickly understand the research methods in this field,explore the evolution of methods and their future develop-ment trends,and lay the foundation for the service and application of the research method corpus in the subsequent digital wave.[Method/process]Firstly,based on academic articles published in the Journal of the China Society for Sci-entific and Technical Information from 2000 to 2023,this study randomly selected 50 articles and manually annotated the research methodology entities within them,using these as the training corpus for entity extraction.Secondly,two models,BERT-base-chinese and Chinese-BERT-wwm-ext,were selected for entity extraction,and the model with superior performance was chosen as the final entity extraction model for this study.[Result/conclusion]This paper con-structs a fine-grained research method annotation corpus of informatics that includes six types of entities:theoretical entity,method entity,dataset entity,indicator entity,tool entity,and other entities.In the task of training an entity ex-traction model based on manually annotated corpora,the Chinese-BERT-wwm-ext model performed better,with an accuracy rate,recall rate,and F1 score of 0.808 2,0.846 7,and 0.827 0,respectively.Furthermore,this paper con-ducts an analysis of the research method entities and their categories,discovering that research methodologies in infor-mation science are becoming increasingly diverse,with emerging technologies coexisting alongside traditional meth-ods,each showcasing their unique strengths.

information sciencecorpus of research methodresearch method entitiesresearch method recognitionfine-grained research method

郝家亿、王玉琢、章成志

展开 >

南京理工大学经济管理学院,南京 210094

安徽大学管理学院,合肥 230601

情报学 研究方法语料库 研究方法实体 研究方法识别 细粒度研究方法

2025

科技情报研究

科技情报研究

ISSN:
年,卷(期):2025.7(1)