[目的]有效利用现有机构规范文档,解决多来源机构规范文档的遴选、评价以及文档间缺少映射、关系冗余等问题.[方法]以调研、梳理已有机构规范文档及相关研究为基础,构建包含元数据收集及分析、元数据框架融合、关系融合、别名融合、构建机构规范文档数据模型、融合结果验证6个步骤的融合模型,对多来源机构规范文档实现融合,并利用Dimensions、Scopus、Web of Science的部分机构数据进行验证.[结果]利用多种指标评估模型的融合效果,在一、二、三级机构中F1值达到0.97以上且Dimensions的融合贡献度最大;构建了包含5 128个机构的规范文档.[局限]机构关系只考虑上下级关系,关系之间循环引用以及机构规范名如何选取等问题尚未深入研究;只选取三个来源的部分机构进行验证,在更大数据集上的泛化性能有待进一步验证.[结论]本模型对多来源数据库的机构规范文档的融合是有效的.
Fusion of Organization Authority Files from Multiple Sources
[Objective]This paper aims to improve the selection and evaluation of the organization authority files(OAF)and address the mapping issues between OAF and redundant relationships.[Methods]First,we examined the existing OAF and related studies.Then,we constructed a fusion model with six steps:data collection and analysis,metadata framework fusion,organization relationship fusion,alias fusion,OAF data model construction,and verification of fusion results.Finally,we examined the new model using data from Dimensions,Scopus,and Web of Science.[Results]Our new model's F1 value reached 0.97 or above in the first,second,and third-level organizations,and the Dimensions made the most significant contribution.We constructed an OAF containing 5,128 organizations.[Limitations]The organization relationship only included the parent-child relations.Cross-reference relations and the choice of standard organization names need to be studied.We also need to verify the proposed model with more data.[Conclusions]The new model could effectively integrate OAF from multiple sources.
Organization Authority File FusionMetadata Framework FusionMulti-source OAFScientific Research Entity Authority