融合标签知识的中文医学命名实体识别

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：医学领域命名实体识别是信息抽取任务重要的研究内容之一,其训练数据主要来源于临床实验数据、健康档案、电子病历等非结构化文本,然而标注这些数据需要专业人员耗费大量人力、物力和时间资源.在缺乏大规模医学训练数据的情况下,医学领域命名实体识别模型很容易出现识别错误的情况.为解决这一难题,文中提出了一种融合标签知识的中文医学命名实体识别方法,即通过专业领域词典获得文本标签的释义后,分别将文本、标签及标签释义编码,基于自适应融合机制进行融合,有效平衡特征提取模块和语义增强模块的信息流,从而提高模型性能.其核心思想在于医学实体标签是通过总结归纳大量医学数据得到的,而标签释义是对标签进行科学解释和说明的结果,模型融入这些蕴含了丰富的医学领域内的先验知识,可以使其更准确地理解实体在医学领域中的语义并提升其识别效果.实验结果表明,该方法在中文医学实体抽取数据集(CMeEE-V2)3个基线模型上分别取得了0.71％,0.53％和1.17％的提升,并且为小样本场景下的实体识别提供了一个有效的解决方案.

外文标题：Chinese Medical Named Entity Recognition with Label Knowledge

外文摘要：Named entity recognition in the medical field is one of the important research contents of information extraction tasks.Its training data mainly comes from unstructured texts such as clinical trial data,health records,electronic medical records.How-ever,labeling these data requires professionals to spend a lot of manpower,material resources and ime.In the absence of large-scale medical training data,named entity recognition models in the medical field are prone to recognition errors.In order to solve this problem,this paper proposes a Chinese medical named entity recognition method that integrates label knowledge,that is,after obtaining the interpretation of the text label through a professional field dictionary,the text,label and label interpretation are en-coded separately,and the fusion is performed based on an adaptive fusion mechanism,to effectively balance the information flow of the feature extraction module and the semantic enhancement module,thereby improving the model performance.The core idea is that the medical entity label is obtained by summarizing a large amount of medical data,and the label interpretation is the result of scientific explanation and explanation of the label.The model incorporates these rich prior knowledge in the medical field to make it more accurate.Accurately understand the semantics of entities in the medical domain and improve their recognition.Ex-perimental results show that the method has achieved 0.71％,0.53％and 1.17％improvement on the three baseline models of the Chinese medical entity extraction dataset(CMeEE-V2),and provides an effective method for entity recognition in small sam-ple scenarios.

外文关键词：

Chinese medical named entity recognitionLabel knowledgePrior knowledgeAdaptive fusion mechanismFew shot

作者：

尹宝生、周澎

展开 >

作者单位：

沈阳航空航天大学人机智能研究中心沈阳 110136

关键词：

中文医学命名实体识别标签知识先验知识自适应融合机制小样本

基金：

辽宁省教育厅项目

项目编号：

LJKMZ20220536

出版年：

2024

DOI：

10.11896/jsjkx.230500203

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(z1)

参考文献量29