首页|融合BERT模型与词汇增强的中医命名实体识别模型

融合BERT模型与词汇增强的中医命名实体识别模型

扫码查看
现有的中医命名实体识别相关研究较少,基本都是基于中文病例做相关研究,在传统中医编写的病例文本中表现不佳.针对中医案例中命名实体密集且边界模糊难以划分的特点,提出了一种融合词汇增强和预训练模型的中医命名实体识别方法LEBERT-BILSTM-CRF.该方法从词汇增强和预训练模型融合的角度进行优化,将词汇信息输入到BERT模型中进行特征学习,达到划分词类边界和区分词类属性的目的,提高中医医案命名实体识别的精度.实验结果表明,在文中构建的中医病例数据集上针对10个实体进行命名实体识别时,提出的基于LEBERT-BILSTM-CRF的中医案例命名实体识别模型综合准确率、召回率、F1分别为88.69%,87.4%,88.1%,高于BERT-CRF,LEBERT-CRF等常用命名实体识别模型.
TCM Named Entity Recognition Model Combining BERT Model and Lexical Enhancement
There are few researches on TCM named entity recognition,and most of them are based on Chinese medical cases,and they do not perform well in TCM case texts.Aiming at the characteristics of dense named entities and fuzzy boundary in TCM ca-ses,this paper proposes a method of TCM named entity recognition,LEBERT-BILSTM-CRF,which combines lexical enhance-ment and pre-training model.This method is optimized from the perspective of the fusion of vocabulary enhancement and pre-training model,and the vocabulary information is input into the BERT model for feature learning,so as to achieve the purpose of dividing word class boundaries and distinguishing word class attributes,and improve the accuracy of TCM medical case named en-tity recognition.Experiments show that when ten entities are identified on the TCM case data set constructed in this paper,the comprehensive accuracy rate,recall rate and F1 of the TCM case named entity recognition model based on LEBERT-BILSTM-CRF is 88.69%,87.4%and 88.1%,respectively.It is higher than common named entity recognition models such as BERT-CRF and LEBERT-CRF.

Natural language processingChinese medicine caseVocabulary enhancementBERTBiLSTM-CRF

李旻哲、殷继彬

展开 >

昆明理工大学信息工程与自动化学院 昆明 650500

自然语言处理 中医案例 词汇增强 BERT BLSTM-CRF

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(z1)
  • 15