Distant Supervision Relation Extraction Based on PCNN Similar Bag Attention
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
维普
万方数据
在关系抽取任务中,远程监督通过对齐知识库(KB)和文本来自动生成训练数据,从而解决了人工标注数据的问题.然而,远程监督不可避免会伴随着错误标签的问题.为了解决错误标签的问题,该文提出了基于PCNN(分段卷积神经网络)相似句袋注意力的远程监督关系抽取方法(PCNN-PATT-SBA),该模型提出了基于高斯分布的位置注意力机制(PATT),通过对非实体词与实体词之间的位置关系建模,为句子中每个单词分配相应的权重,从而降低噪声词的影响.另外,基于不同句袋之间的特征相似性,该文提出了相似句袋注意力机制(SBA),通过融合相似句袋的特征,从而达到解决单句子句袋信息过少的问题.在数据集New York Times(NYT)上的实验结果证明了该文提出方法的有效性,并且相对于句袋间注意力模型,在P@N值上提高了 6.9%.
In the task of relation extraction,distant supervision automatically generates training data by aligning the knowledge base(KB)and text,thereby solving the problem of manual data annotation.However,distant supervi-sion will inevitably be accompanied by the problem of wrong labeling.In order to solve the problem of wrong labe-ling,this paper proposes a distant supervision relation extraction method based on PCNN(piecewise convolutional neural networks)similar sentence bag attention(PCNN-PATT-SBA).This model proposes the position attention mechanism(PATT)based on Gaussian distribution,which models the position relationship between non-entity words and entity words,assigns corresponding weights to each word in the sentence,thereby reducing the influence of noise words.In addition,based on the feature similarity between bag-of-sentences,this paper proposes the similar sentence bag attention mechanism(SBA),which can enrich a single sentence bag by merging the characteristics of similar bags.The experimental results on the dataset New York Times(NYT)prove the proposed method increases the P@N by 6.9%compared to the inter-cross-bag model.
distant supervisionposition featuresimilarityattention mechanismGaussian distribution