计算机工程与科学2024,Vol.46Issue(2) :316-324.DOI:10.3969/j.issn.1007-130X.2024.02.014

基于类型注意力和GCN的远程监督关系抽取

Distant supervision relation extraction based on type attention and GCN

张欢 李卫疆
计算机工程与科学2024,Vol.46Issue(2) :316-324.DOI:10.3969/j.issn.1007-130X.2024.02.014

基于类型注意力和GCN的远程监督关系抽取

Distant supervision relation extraction based on type attention and GCN

张欢 1李卫疆1
扫码查看

作者信息

  • 1. 昆明理工大学信息工程与自动化学院,云南 昆明 650500;昆明理工大学云南省人工智能重点实验室,云南 昆明 650500
  • 折叠

摘要

远程监督关系抽取通过自动对齐自然语言文本与知识库生成带有标签的训练数据集,解决样本人工标注的问题.目前的远程监督研究大多没有关注到长尾(long-tail)数据,因此远程监督得到的大多数句包中所含句子太少,不能真实全面地反映数据的情况.因此,提出基于位置-类型注意力机制和图卷积网络的远程监督关系抽取模型PG+PTATT.利用图卷积网络GCN聚合相似句包的隐含高阶特征,并对句包进行优化以此得到句包更丰富全面的特征信息;同时构建位置-类型注意力机制PTATT,以解决远程监督关系抽取中错误标签的问题.PTATT利用实体词与非实体词的位置关系以及类型关系进行建模,减少噪声词带来的影响.提出的模型在New York Times数据集上进行实验验证,实验结果表明提出的模型能够有效解决远程监督关系抽取中存在的问题;同时,能够有效提升关系抽取的正确率.

Abstract

Distant supervision relation extraction uses the automatic alignment of natural language texts and knowledge bases to generate labeled training datasets,solving the problem of manual sample labeling.In the current research,most distant supervision does not pay attention to the long-tail data,so most of the sentence bags obtained by distant supervision contain too few sentences.These sentence bags cannot truly and comprehensively express the data itself.This paper proposes a distant supervised relation extraction model(PG+PTATT)based on position-type attention mechanism and graph convo-lutional network.According to the similarity between sentence bags,Graph Convolutional Networks(GCN)aggregate the implicit high-level features of similar sentence bags to optimize the sentence bags and obtain more prosperous and more comprehensive feature information of the sentence bags.At the same time,an attention mechanism,Position-Type Attention(PTATT)is constructed,which can solve the problem of wrong labels in distant supervision relation extraction:using the position relationships between entity words and non-entity words and type relationships are modeled to reduce the impact of noisy words.The proposed model is experimentally verified on the dataset New York Times(NYT),and the experimental results show that the proposed model can effectively solve the problems existing in distant supervision relation extraction;and it can effectively improve the accuracy of relation extraction.

关键词

远程监督/关系抽取/图卷积网络/注意力机制/类型关系/句包

Key words

distant supervision/relation extraction/graph convolutional network/attention mecha-nism/type relationship/sentence bag

引用本文复制引用

基金项目

国家自然科学基金(62066022)

出版年

2024
计算机工程与科学
国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心
影响因子:0.787
ISSN:1007-130X
参考文献量1
段落导航相关论文