为解决渔业健康养殖标准文本关系抽取领域特定性强、语义复杂导致关系抽取准确率不高等问题,提出了基于改进BiRTE的渔业健康养殖标准复杂关系抽取方法,针对实体和语义关联建模,将RoBERTa作为编码器,采用全词掩码和动态掩码的方式增强词向量特征表示,并在此基础上融合了自注意力机制(Self-Attention,SelfATT)将实体特征与关系特征结合聚焦,加强实体抽取与关系预测的联系,从而提升渔业标准文本抽取的准确性.结果表明:本文提出的基于改进BiRTE的渔业健康养殖标准复杂关系抽取模型(RoBERTa-BiRTE-SelfATT)对渔业标准复杂关系抽取的准确率、召回率和 F1 值分别为 95.9%、95.4%、95.7%,较BiRTE模型分别提升了 4.2%、3.1%、3.8%.研究表明,本文提出的渔业健康养殖标准复杂关系抽取模型RoBERTa-BiRTE-SelfATT可以有效解决渔业标准文本关系抽取中专有名词识别不准确、语义复杂导致实体关系难以抽取的问题,是一种有效的渔业标准复杂关系抽取方法.
Abstract
A complex relationship extraction method for health aquaculture standards is proposed to address issues such as inaccurate recognition of domain-specific nouns and the complexity of semantics hindering entity relation-ship extraction based on an improved BiRTE model.The BiRTE model,which reduces error propagation through bidirectional extraction and exhibits strong relationship extraction capabilities,was adopted as the foundational mod-el.To enhance the model's information-extracting ability from texts of fisheries standard files,RoBERTa was used as the encoder encoding domain-specific nouns in fisheries standard files using whole-word masking and dynamic masking,enriching word vector information and enhancing feature representation.Thus,the Self-Attention is inte-grated to combine entity features and relationship features,focusing on strengthening the connection between entity extraction and relation prediction,thereby improving the accuracy of relation extraction.It was found that the pro-posed model achieved precision of 95.9%,recall of 95.4%,and F1 scores of 95.7%from the extraction of com-plex relationships in fisheries standards,representing an improvement of 4.2%,3.1%,and 3.8%,respectively,compared to the original model.The finding indicates that the proposed improved BiRTE-based model,as an effec-tive method for extracting complex relationships in fishing standards,can effectively address the problems of inaccu-rate identification of proper nouns and difficulty in extracting entity relationships due to semantic complexity in the field of fishing standard text relation extraction.