首页|融合多种特征的多学科引文分类研究

融合多种特征的多学科引文分类研究

扫码查看
引文分类研究作为深入理解引用行为的主要方式,在文献管理、检索与利用等多个场景中发挥着重要作用.本研究通过机器学习的方法,在回顾了重要的引用行为机理研究与引文分类研究的基础上,对引文分类做进一步探索.本研究通过匹配文献数据库和文档解析的方式,使得原始数据集的字段得以补充和增加,并在构建引文分类模型的过程中,提取了四大类可能与引文分类相关的特征;然后,通过模拟退火算法进行特征选择.研究结果显示,本研究建立的随机森林模型对引文影响力和引文功能分类的表现较好,优于支持向量机与SciBERT结合线性层的分类模型.本研究建立的模型提升了多学科引文自动分类的性能,关于特征提取和选择的过程以及对引文类别和一些因素间关系的探索,对相关研究具有一定的参考价值.
Multi-disciplinary Citation Classification with Multiple Features
As the primary method to deeply understand citation behavior,citation classification research plays an impor-tant role in many scenarios,such as document management,retrieval,and utilization.This study uses machine learning methods to further explore citation classification by reviewing important citation behavior mechanisms and citation classifi-cation research.In this study,the fields of the original dataset can be supplemented and increased by matching the litera-ture database and document analysis,and the features of four major categories that may be related to citation classification are extracted during the construction of the citation classification model.Thereafter,the feature selection is conducted us-ing a simulated annealing algorithm.The results indicate that the established random forest model has the best performance on citation influence and citation function classification and outperforms the classification model combining the support vector machine with the SciBERT linear layer.The model established by the study improves the performance of automatic classification of multidisciplinary citations and the process of feature extraction and selection in research,as well as the ex-ploration of the relationship between citation categories and some factors that have certain reference values for related re-search.

citation classificationcitation content analysiscitation functioncitation influencedisciplinary differences

郑智涵、李昕雨、孟凡、步一

展开 >

北京大学信息管理系,北京 100871

引文分类 引文内容分析 引文功能 引文影响力 学科差异

国家社会科学基金一般项目

20BTQ054

2024

情报学报
中国科学技术情报学会 中国科学技术信息研究所

情报学报

CSTPCDCSSCICHSSCD北大核心
影响因子:1.296
ISSN:1000-0135
年,卷(期):2024.43(6)
  • 28