[目的]解决现有睡美人文献识别方法需要依赖长期引文曲线的问题,探索基于早期引文曲线的睡美人系数预测方法.[方法]本文提出基于ts2net模型的预测方法,将文献的引文曲线转化为NVG、HVG和QG三种复杂网络,提取每个网络平均度、平均路径长度、聚集系数、社团数量和模块度等5个特征,并基于机器学习模型构建预测方法.[结果]在Web of Science平台收集计算机领域89 681篇文献作为实验数据,结果表明,B系数与Bcp系数均与复杂网络特征具有相关性,结合机器学习模型构建的预测方法中,MLP与GBRT效果最好.MLP在Bcp系数预测上最优,误差为5.90%;GBRT在B系数预测上最优,误差为31.18%.[局限]对于引文频率波动较大、睡眠周期较长的文献,本文方法的预测准确性会下降.此外,预测得到睡美人系数仅是睡美人文献的可能性指标,需结合下游睡美人文献识别模型或任务做进一步判别.[结论]本文验证了将引文曲线转化为复杂网络,进而利用网络特征构建睡美人系数预测具有可行性.
Predicting Sleeping Beauty Coefficients Based on ts2net Model
[Objective]The existing Sleeping Beauty literature recognition methods rely on long-term citation curves.We explore new prediction methods for Sleeping Beauty coefficients based on early-stage citation curves.[Methods]This paper proposed a prediction method based on the ts2net models.Firstly,we transformed the citation curve of literature into three types of complex networks:NVG,HVG,and QG.Secondly,we extracted five features from each network:average degree,average path length,clustering coefficient,number of communities,and modularity.Finally,we used a machine learning-based model to construct the prediction method.[Results]We examined the new model with 89,681 computer science papers retrieved from the Web of Science.We found that the B and the Bcp coefficients correlated with the complex network features.Among the prediction methods built using machine learning models,MLP and GBRT performed the best.MLP achieved the optimal accuracy in predicting the Bcp coefficient with an error rate of 5.90%,while GBRT predicted the B coefficient with an error rate of 31.18%.[Limitations]The prediction accuracy of the new method decreased for literature with high fluctuations in citation frequency or long dormant periods.Additionally,the predicted Sleeping Beauty coefficient serves only as an indicator of potential Sleeping Beauty literature,which needs further validation through downstream Sleeping Beauty literature recognition models or tasks.[Conclusions]This study demonstrates the feasibility of converting citation curves into complex networks and constructing Sleeping Beauty coefficient predictions using network features.