中文信息学报2024,Vol.38Issue(1) :45-56.

子图增强的实时同名消歧

Real-time Name Disambiguation with Subgraph Enhancement

韩天翼 程欣宇 张帆进 陈波
中文信息学报2024,Vol.38Issue(1) :45-56.

子图增强的实时同名消歧

Real-time Name Disambiguation with Subgraph Enhancement

韩天翼 1程欣宇 1张帆进 2陈波2
扫码查看

作者信息

  • 1. 贵州大学 公共大数据国家重点实验室,贵州 贵阳 550025;贵州大学 文本计算与认知智能教育部工程研究中心,贵州 贵阳 550025
  • 2. 清华大学 计算机科学与技术系,北京 100084
  • 折叠

摘要

实时同名消歧旨在实时、准确地将具有歧义的作者姓名的新增论文关联到同名候选作者中的正确作者.当前同名消歧算法主要解决冷启动同名消歧问题,较少探索如何高效并有效地解决实时同名消歧问题.该文提出了子图增强的实时同名消歧模型 RND-all,该模型通过高效地融合待消歧论文与候选作者之间的结构特征来提升模型的准确率.模型根据待消歧论文的属性与同名候选作者的档案分别构建子图,使用子图结构特征提取框架来计算图相关性特征,最后,通过特征工程以及文本嵌入方法计算语义匹配特征,并利用集成学习实现语义信息与结构信息的融合.实验结果表明,融入结构信息能够有效提升实时同名消歧任务的准确性,RND-all在百万级同名消歧基准 WhoIsWho测试集上效果排名第一.

Abstract

Real-time name disambiguation aims to accurately associate new papers to the correct author among same-name candidates in real-time.This paper proposes a subgraph-enhanced real-time name disambiguation model,RND-all,which uses the structural features between the disambiguation paper and the candidate authors to improve the accuracy.In this model,we construct subgraphs based on the attributes of the paper to be disambiguated and the profiles of the candidate authors with the same name,respectively.Then a subgraph structure feature extraction framework is established to calculate graph-correlation features.Finally,the ensemble learning is applied to in-tegrate the structural information and the semantic information,which are derived by feature engineering and se-mantic text embedding.Experimental results show that incorporating structural information can effectively improve the accuracy of real-time name disambiguation tasks,and RND-all ranks first on the test set of million-level name disambiguation benchmark WhoIsWho.

关键词

实时同名消歧/图神经网络/结构信息/集成学习

Key words

real-time name disambiguation/graph neural network/structural information/ensemble learning

引用本文复制引用

出版年

2024
中文信息学报
中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心
影响因子:0.8
ISSN:1003-0077
参考文献量29
段落导航相关论文