基于"深度学习模型+词典"的针刺效应命名实体识别研究
Recognition of Named Entities in Acupuncture Literature Based on Dictionary and Deep Learning Model
王晰 1柯丽娟 1李海燕 1高彤 2孙华君 1雷蕾1
作者信息
- 1. 中国中医科学院中医药信息研究所 北京 100700
- 2. 北京元素领域信息技术有限公司 北京 101200
- 折叠
摘要
目的 基于针刺RCT文献数据集,提出一种基于"深度学习模型和字典"结合的针刺效应命名实体识别方法,为构建针刺效应知识库提供技术支撑.方法 本文比较了基于word2vec、ALBERT和"ALBERT+词典"的向量表示效果,并在此基础上提出了结合领域词典的ALBERT-BiLSTM-CRF深度学习模型的命名实体抽取方法.结果 从实体抽取效果来看,"ALBERT-BiLSTM-CRF+词典"的实体识别效果最好,P值92.57%、R值91.42%、F1值91.85%,而ALBERT-BiLSTM-CRF模型的效果稍差一些,P值83.10%、R值81.14%、F1值81.98%,word2vec-BiLSTM-CRF的效果最差,P值81.82%、R值70.76%、F1值75.48%.从实体类别看,精确率最高的前三种实体为针法、刺法、针刺部位,分别为98%、97%、97%,精确率最低的三种实体为配穴对应症状、疾病名称、样本量,分别为50.00%、50.68%、52.43%.结论 "ALBERT-BiLSTM-CRF+词典"模型相比于原始的ALBERT-BiLSTM-CRF模型,精准率P、召回率R和F1值都有明显提高.模型用于针刺效应命名实体识别是有效的.
Abstract
Objective Based on the acupuncture literature data set,a named entity recognition method of acupuncture literature based on dictionary and deep learning model is proposed to improve the effect of acupuncture literature entity recognition.Methods In this paper,the entity recognition methods of acupuncture literature were explored,and the vector representation effects based on word2vec and ALBERTmodels and ALBERT+domain dictionary were compared.On this basis,a named entity extraction method combining domain dictionary and ALBERT-BiLSTM-CRF deep learning model was proposed.Results According to the extraction effect of three model entities,the P value of word2vec-BiLSTM-CRF is 81.82%,the R value is 70.76%,and the F1 value is 75.48%;ALBERT-BiLSTM-CRF has an P value of 83.10%,a R value of 81.14%and a F1 value of 81.98%."ALBERT-BiLSTM-CRF+dictionary"is 92.57%,91.42%and 91.85%.In terms of entity categories,the top three entities with the highest accuracy rate are acupuncture,needling and needling site,which are 98%,ninety-seven percent and ninety-seven percent respectively,while the three entities with the lowest accuracy rate are acupoint matching corresponding symptoms,disease names and sample size,which are 50.00%,50.68%and 52.43%respectively.Conclusion Compared with the original ALBERT-BiLSTM-CRF model,the precision rate,recall rate and F1 value increased after adding the dictionary,and the convergence speed of the model after adding the dictionary was twice that without adding the dictionary.It is effective to use"ALBERT-BiLSTM-CRF+dictionary"model to identify named entities in acupuncture literature.
关键词
BiLSTM-CRF/针灸/随机对照试验/命名实体识别Key words
BiLSTM-CRF/Acupuncture/Randomized controlled trial/Named entity identification引用本文复制引用
出版年
2024