计算机科学2024,Vol.51Issue(z1) :204-209.DOI:10.11896/jsjkx.230800081

融合证据句子提取的文档级关系抽取

Document-level Relation Extraction Integrating Evidence Sentence Extraction

安先跨 肖蓉 杨肖
计算机科学2024,Vol.51Issue(z1) :204-209.DOI:10.11896/jsjkx.230800081

融合证据句子提取的文档级关系抽取

Document-level Relation Extraction Integrating Evidence Sentence Extraction

安先跨 1肖蓉 1杨肖1
扫码查看

作者信息

  • 1. 湖北大学计算机与信息工程学院 武汉 430062
  • 折叠

摘要

文档级关系抽取作为自然语言处理领域的一个关键任务,旨在从长文档中准确抽取实体对之间的语义关系.传统的文档级关系抽取方法通常将整个文档作为输入,但事实上,人类只需根据文档中的部分句子即可预测实体对的关系,即证据句子.在现有研究中,很多研究方法都利用了证据句子,但是都存在无法找全以及很难充分利用这些证据句子的优势等问题.针对该问题,引入更加高效且准确的证据句子选取方法,通过融合公式法和删句法的证据句子提取策略,并将证据提取与训练推理过程相融合,使得文档级关系抽取模型更加关注重要的句子,同时仍可以识别文档中的完整信息.实验表明,改进后的模型在公共数据集上的表现优于已有模型.

Abstract

As a crucial task in the field of natural language processing,document-level relation extraction aims to accurately ex-tract semantic relationships between entities from lengthy documents.Traditional document-level relation extraction methods ty-pically take the entire document as input.However,in reality,humans can predict relationships between entity pairs based on only a portion of the document,referred to as evidence sentences.In existing research,many methods start to utilize evidence sen-tences,but they face challenges such as incomplete evidence retrieval and difficulty in fully leveraging the advantages of these evi-dence sentences.To address this issue,we introduce a more efficient and accurate evidence sentence selection method.This is achieved by integrating a strategy for extracting evidence sentences through a fusion of formula-based and sentence-deletion-based approaches.We seamlessly integrate the evidence extraction with the training and inference processes,directing the document-le-vel relation extraction model to focus more on crucial sentences while still recognizing comprehensive information within the doc-ument.Experimental results demonstrate that the improved model outperforms existing models on public datasets.

关键词

文档级/关系抽取/证据句子/双线性层

Key words

Document-level/Relation extraction/Evidence sentences/Bilinear layer

引用本文复制引用

基金项目

科技大数据湖北省重点实验室(中国科学院武汉文献情报中心)开放基金(E1KF291005)

出版年

2024
计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
参考文献量21
段落导航相关论文