首页|基于BERT-BiLSTM-CRF模型的油气领域命名实体识别

基于BERT-BiLSTM-CRF模型的油气领域命名实体识别

扫码查看
针对油气领域知识图谱构建过程中命名实体识别使用传统方法存在实体特征信息提取不准确、识别效率低的问题,提出了一种基于BERT-BiLSTM-CRF模型的命名实体识别研究方法.该方法首先利用BERT(bidirectional encoder representations from transformers)预训练模型得到输入序列语义的词向量;然后将训练后的词向量输入双向长短期记忆网络(bi-directional long short-term memory,BiLSTM)模型进一步获取上下文特征;最后根据条件随机场(conditional random fields,CRF)的标注规则和序列解码能力输出最大概率序列标注结果,构建油气领域命名实体识别模型框架.将BERT-BiLSTM-CRF模型与其他2种命名实体识别模型(BiLSTM-CRF、BiLSTM-Attention-CRF)在包括3万多条文本语料数据、4类实体的自建数据集上进行了对比实验.实验结果表明,BERT-BiLSTM-CRF模型的准确率(P)、召回率(R)和F1值分别达到91.3%、94.5%和92.9%,实体识别效果优于其他2种模型.
Named entity recognition in oil and gas domain based on the BERT-BiLSTM-CRF model
Aiming at solving problems of insufficient feature information extraction and low recognition efficiency in the construction of knowledge graph in the oil and gas domain,this paper proposes a method for named entity recognition based on the BERT-BiLSTM-CRF model.The method first uses the BERT(bidirectional encoder representations from transformers)pre-training the model to obtain the word vectors of the semantics of the input sequence;Then,further obtains the context characteristics by the input of the trained word vectors into the bi-directional long short-term memory(BiLSTM)model;Finally,according to the labeling rules and sequence decoding ability of conditional random fields(CRF),the maximum probability sequence labeling results are obtained,and a model framework for named entity recognition in the oil and gas field is constructed.This model is compared to two commonly used named entity recognition models using the self-built datasets of more than 30 000 text corpora data and four types of entities,and the experimental results showed that the accuracy(P),recall rate(R)and F1 value of the proposed model reached 91.3%,94.5%and 92.9%,respectively,and the entity recognition performance was superior to other two models.

oil and gas domainnamed entity recognitionbidirectional encoder representations from transformers(BERT)bi-directional long short-term memoryconditional random fieldsBERT-BiLSTM-CRF model

高国忠、李宇、华远鹏、吴文旷

展开 >

长江大学地球物理与石油资源学院,湖北武汉 430100

中国石油勘探开发研究院,北京 100083

油气领域 命名实体识别 BERT 双向长短期记忆网络 条件随机场 BERT-BiLSTM-CRF模型

教育部中国高校产学研创新基金项目

2021BCF03006

2024

长江大学学报(自科版)
长江大学

长江大学学报(自科版)

影响因子:0.335
ISSN:1673-1409
年,卷(期):2024.21(1)
  • 2
  • 27