首页|基于字词融合与对抗训练的医疗对话实体识别

基于字词融合与对抗训练的医疗对话实体识别

扫码查看
针对BERT-BiLSTM-CRF在中文医疗对话实体识别过程中存在字词边界特征获取不充分、实体边界语义泛化能力弱和复杂实体嵌套识别准确率较差等问题,提出了一种基于字词融合与对抗训练的医疗对话实体识别模型.首先,引入外部词汇匹配句中字符对应的字词特征,通过词汇适配器(Lexicon Adapter,LA)集成到BERT模型内部形成字词融合向量,加入对抗训练(Projected Gradient Descent,PGD)生成对抗样本;其次,将字词融合向量与对抗样本作为训练数据传入双向门控循环单元(Bidirectional Gated Recurrent Unit,BiGRU)提取上下文语义信息;最后,用条件随机场(Conditional Random Field,CRF)最终解码,在IMCS21 中文医疗对话数据集上实验表明,模型的F1 值达到92.06%.相较于BERT-BiLSTM-CRF模型,有效提高了复杂语义的实体理解和标签识别精度.
Entity Recognition of Medical Conversation Based on Word Fusion and Adversarial Training
Aiming at the problems of insufficient acquisition of word boundary features,weak semantic gen-eralization ability of entity boundaries and poor recognition accuracy of complex entities nested in the process of entity recognition of medical dialogue by BERT-BiLSTM-CR Chinese F,a medical dialogue entity recognition model based on word fusion and adversarial training is proposed.Firstly,introduce the word features correspond-ing to the characters in the external vocabulary matching sentences,integrate the word fusion vector into the BERT model through the Lexicon Adapter(LA),add adversarial training(Projected Gradient Descent,PGD)to generate adversarial samples,and then pass the word fusion vector and adversarial samples as training data to the bidirectional gated loop unit(Bidirectional Gated Recurrent Unit,BiGRU)extract the context semantic in-formation,and finally decode it with a Conditional Random Field(CRF).Experiments on the IMCS21 Chinese medical dialogue dataset show that the F1 value of the model reach 92.06%.Compared with the BERT-BiL-STM-CRF model,the entity understanding and label recognition accuracy of complex semantics are effectively improved.

NERdeep learningBERTBiGRUCRFPGD

田海强、汪济洲、徐海珍、孔维哲

展开 >

合肥大学 先进制造工程学院,安徽 合肥 230601

安徽大学 人工智能学院,安徽 合肥 230601

实体识别 深度学习 BERT BiGRU CRF 对抗训练

安徽省高等学校自然科学研究重大项目

KJ2020ZD58

2024

黑龙江工业学院学报(综合版)
鸡西大学

黑龙江工业学院学报(综合版)

影响因子:0.211
ISSN:1672-6758
年,卷(期):2024.24(2)
  • 20