首页|基于"深度学习模型+词典"的针刺效应命名实体识别研究

基于"深度学习模型+词典"的针刺效应命名实体识别研究

扫码查看
目的 基于针刺RCT文献数据集,提出一种基于"深度学习模型和字典"结合的针刺效应命名实体识别方法,为构建针刺效应知识库提供技术支撑。方法 本文比较了基于word2vec、ALBERT和"ALBERT+词典"的向量表示效果,并在此基础上提出了结合领域词典的ALBERT-BiLSTM-CRF深度学习模型的命名实体抽取方法。结果 从实体抽取效果来看,"ALBERT-BiLSTM-CRF+词典"的实体识别效果最好,P值92。57%、R值91。42%、F1值91。85%,而ALBERT-BiLSTM-CRF模型的效果稍差一些,P值83。10%、R值81。14%、F1值81。98%,word2vec-BiLSTM-CRF的效果最差,P值81。82%、R值70。76%、F1值75。48%。从实体类别看,精确率最高的前三种实体为针法、刺法、针刺部位,分别为98%、97%、97%,精确率最低的三种实体为配穴对应症状、疾病名称、样本量,分别为50。00%、50。68%、52。43%。结论 "ALBERT-BiLSTM-CRF+词典"模型相比于原始的ALBERT-BiLSTM-CRF模型,精准率P、召回率R和F1值都有明显提高。模型用于针刺效应命名实体识别是有效的。
Recognition of Named Entities in Acupuncture Literature Based on Dictionary and Deep Learning Model
Objective Based on the acupuncture literature data set,a named entity recognition method of acupuncture literature based on dictionary and deep learning model is proposed to improve the effect of acupuncture literature entity recognition.Methods In this paper,the entity recognition methods of acupuncture literature were explored,and the vector representation effects based on word2vec and ALBERTmodels and ALBERT+domain dictionary were compared.On this basis,a named entity extraction method combining domain dictionary and ALBERT-BiLSTM-CRF deep learning model was proposed.Results According to the extraction effect of three model entities,the P value of word2vec-BiLSTM-CRF is 81.82%,the R value is 70.76%,and the F1 value is 75.48%;ALBERT-BiLSTM-CRF has an P value of 83.10%,a R value of 81.14%and a F1 value of 81.98%."ALBERT-BiLSTM-CRF+dictionary"is 92.57%,91.42%and 91.85%.In terms of entity categories,the top three entities with the highest accuracy rate are acupuncture,needling and needling site,which are 98%,ninety-seven percent and ninety-seven percent respectively,while the three entities with the lowest accuracy rate are acupoint matching corresponding symptoms,disease names and sample size,which are 50.00%,50.68%and 52.43%respectively.Conclusion Compared with the original ALBERT-BiLSTM-CRF model,the precision rate,recall rate and F1 value increased after adding the dictionary,and the convergence speed of the model after adding the dictionary was twice that without adding the dictionary.It is effective to use"ALBERT-BiLSTM-CRF+dictionary"model to identify named entities in acupuncture literature.

BiLSTM-CRFAcupunctureRandomized controlled trialNamed entity identification

王晰、柯丽娟、李海燕、高彤、孙华君、雷蕾

展开 >

中国中医科学院中医药信息研究所 北京 100700

北京元素领域信息技术有限公司 北京 101200

BiLSTM-CRF 针灸 随机对照试验 命名实体识别

2024

世界科学技术-中医药现代化
中科院科技政策与管理科学研究所,中国高技术产业发展促进会

世界科学技术-中医药现代化

CSTPCD北大核心
影响因子:1.175
ISSN:1674-3849
年,卷(期):2024.26(7)