融合BERT模型与词汇增强的中医命名实体识别模型

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：现有的中医命名实体识别相关研究较少,基本都是基于中文病例做相关研究,在传统中医编写的病例文本中表现不佳.针对中医案例中命名实体密集且边界模糊难以划分的特点,提出了一种融合词汇增强和预训练模型的中医命名实体识别方法LEBERT-BILSTM-CRF.该方法从词汇增强和预训练模型融合的角度进行优化,将词汇信息输入到BERT模型中进行特征学习,达到划分词类边界和区分词类属性的目的,提高中医医案命名实体识别的精度.实验结果表明,在文中构建的中医病例数据集上针对10个实体进行命名实体识别时,提出的基于LEBERT-BILSTM-CRF的中医案例命名实体识别模型综合准确率、召回率、F1分别为88.69％,87.4％,88.1％,高于BERT-CRF,LEBERT-CRF等常用命名实体识别模型.

外文标题：TCM Named Entity Recognition Model Combining BERT Model and Lexical Enhancement

外文摘要：There are few researches on TCM named entity recognition,and most of them are based on Chinese medical cases,and they do not perform well in TCM case texts.Aiming at the characteristics of dense named entities and fuzzy boundary in TCM ca-ses,this paper proposes a method of TCM named entity recognition,LEBERT-BILSTM-CRF,which combines lexical enhance-ment and pre-training model.This method is optimized from the perspective of the fusion of vocabulary enhancement and pre-training model,and the vocabulary information is input into the BERT model for feature learning,so as to achieve the purpose of dividing word class boundaries and distinguishing word class attributes,and improve the accuracy of TCM medical case named en-tity recognition.Experiments show that when ten entities are identified on the TCM case data set constructed in this paper,the comprehensive accuracy rate,recall rate and F1 of the TCM case named entity recognition model based on LEBERT-BILSTM-CRF is 88.69％,87.4％and 88.1％,respectively.It is higher than common named entity recognition models such as BERT-CRF and LEBERT-CRF.

外文关键词：

Natural language processingChinese medicine caseVocabulary enhancementBERTBiLSTM-CRF

作者：

李旻哲、殷继彬

展开 >

作者单位：

昆明理工大学信息工程与自动化学院昆明 650500

关键词：

自然语言处理中医案例词汇增强 BERT BLSTM-CRF

出版年：

2024

DOI：

10.11896/jsjkx.230900030

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(z1)

参考文献量15