摘要
实体关系抽取是实现海量文本数据知识化、自动构建大规模知识图谱的关键技术.考虑到头尾实体信息对关系抽取有重要影响,该文采用注意力机制将实体对信息融合到关系抽取过程中,提出了基于实体对注意力机制的实体关系联合抽取模型(EPSA).首先,使用双向长短时记忆网络(Bi-LSTM)结合条件随机场(CRF)完成实体的识别;其次,将抽取的实体配对,信息融合成统一的嵌入式表示形式,用于计算句子中各词的注意力值;然后,使用基于实体对注意力机制的句子编码模块得到句子表示,再利用显式融合实体对的信息得到增强型句子表示;最后,通过分类方式完成实体关系的抽取.在公开数据集NYT和 WebNLG上对提出的EPSA模型进行评估,实现结果表明,与 目前主流联合抽取模型相比,EPSA模型在F1值上均得到提升,分别达到84.5%和88.5%,并解决了单一实体重叠问题.
Abstract
Entity and relation extraction is a key technology to automatically build large-scale knowledge graphs from massive text data.Considering the effect of the entity on the discrimination of relation types,this paper proposes a joint entity and relation extraction model based on entity-pair specific attention mechanism(EPSA).First,the entity recognition is completed based on Bi-directional Long Short-Term Memory(Bi-LSTM)and Conditional Random Fields(CRF).Then the extracted entities are combined into entity-pairs and transformed into a unified embedding.The sentence representation is obtained by the entity-pair specific attention mechanism plus the entity-pair embed-ding.And finally,the relation extraction is completed by the a classification process.Experimental results on NYT and WebNLG datasets show that the proposed method out-performs the baselines by achieving 84.5%and 88.5%F1 value,respectively.