基于对比学习与语言模型增强嵌入的知识图谱补全

Knowledge Graph Completion Based on Contrastive Learning and Language Model-Enhanced Embedding

张洪程 ¹李林育 ²杨莉 ³伞晨峻 ¹尹春林 ³颜冰 ¹于虹 ³张璇⁴

扫码查看

作者信息

1. 云南电网有限责任公司政企部,云南昆明 650032
2. 云南大学软件学院,云南昆明 650091
3. 云南电网有限责任公司电力科学研究院,云南昆明 650217
4. 云南大学软件学院,云南昆明 650091;云南省软件工程重点实验室,云南昆明 650091;跨境网络空间安全教育部工程研究中心,云南昆明 650091
折叠

摘要

知识图谱是由各种知识或数据单元经过抽取等处理而组成的一种结构化知识库,用于描述和表示实体、概念、事实和关系等信息.自然语言处理技术的限制和各种知识或信息单元文本本身的噪声都会使信息抽取的准确性受到一定程度的影响.现有的知识图谱补全方法通常只考虑单一结构信息或者文本语义信息,忽略了整个知识图谱中同时存在的结构信息与文本语义信息.针对此问题,提出一种基于语言模型增强嵌入与对比学习的知识图谱补全(KGC)模型.将输入的实体和关系通过预训练语言模型获取实体和关系的文本语义信息,利用翻译模型的距离打分函数捕获知识图谱中的结构信息,使用2种用于对比学习的负采样方法融合对比学习来训练模型以提高模型对正负样本的表征能力.实验结果表明,与基于来自Transformer的双向编码器表示的知识图谱补全(KG-BERT)模型相比,在WN18RR和FB15K-237数据集上该模型链接预测的排名小于等于10的三元组的平均占比(Hits@10)分别提升了 31％和23％,明显优于对比模型.

Abstract

A knowledge graph is a structured knowledge base comprising various types of knowledge or data units obtained through extraction and other processes.It is used to describe and represent information,such as entities,concepts,facts,and relationships.The limitations of Natural Language Processing(NLP)technology and the presence of noise in the texts of various knowledge or information units affect the accuracy of information extraction.Existing Knowledge Graph Completion(KGC)methods typically account for only single structural information or text semantic information,whereas the structural and text semantic information in the entire knowledge graph is disregarded.Hence,a KGC model based on contrastive learning and language model-enhanced embedding is proposed.The input entities and relationships are obtained using a pretrained language model to obtain the textual semantic information of the entities and relationships.The distance scoring function of the translation model is used to capture the structured information in the knowledge graph.Two negative sampling methods for contrastive learning are used to fuse contrastive learning to train the model to improve its ability to represent positive and negative samples.Experimental results show that compared with the Bidirectional Encoder Representations from Transformers for Knowledge Graph completion(KG-BERT)model,this model improves the average proportion of triple with ranking less than or equal to 10(Hits@10)indicator by 31％and 23％on the WN18RR and FB15K-237 datasets,respectively,thus demonstrating its superiority over other similar models.

关键词

知识图谱补全/知识图谱/对比学习/预训练语言模型/链接预测

Key words

Knowledge Graph Completion(KGC)/knowledge graph/contrastive learning/pretrained language model/link prediction

引用本文复制引用

基金项目

国家自然科学基金(61862063)

国家自然科学基金(61502413)

国家自然科学基金(61262025)

云南电网有限责任公司创新项目(YNKJXM20222254)

云南省中青年学术和技术带头人后备人才项目(202205AC160040)

云南省院士专家工作站项目(202205AF150006)

云南省科技计划重大专项(202202AE090066)

云南省教育厅科研项目(2023Y0256)

云南大学软件学院知识驱动智能软件工程科研创新团队项目()

出版年

2024

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

参考文献量29

段落导航