融合单词级段信息的中文医疗命名实体识别

扫码查看

原文链接

万方数据
维普

中文摘要：中文医疗命名实体识别(Named Entity Recognition,NER)是医学领域的一项基础任务,在知识图谱等许多下游任务中起着重要的作用.常用的NER方法可分为基于词级信息和基于段级信息,已有研究表明两种信息融合能取得更好的性能.目前,词级信息和段级信息融合的方法在中文医疗NER任务中还未被充分研究,且现有的融合方法为段中的每个单词赋予相同的权重,不考虑单词的不同贡献.而医疗实体中每个单词和实体(段)有着不同的相关性,忽略这种相关性的差异将影响医疗NER的性能.基于此,通过分析中文医疗实体特性,提出了一种单词级段信息抽取方法(Word-Level Segment Information Extraction,WL-SIE).该方法为实体中的每个单词分配一个权重矩阵集,学习单词与实体之间的关联信息,在与实体词组交互之后输出不同的单词级段信息.在CCKS2017 和CMeEE中文临床NER数据集上的实验结果表明,WL-SIE方法较对比方法在F1 值上提升了3%～5%,特别是在实体样本不均衡场景下和长实体识别任务上表现出了优异的性能.

外文标题：Research on Chinese Medical Named Entity Recognition Integrating Word-level Segment Information

外文摘要：Chinese medical named entity recognition(NER)is a fundamental task in the field of medicine and plays an important role in various downstream tasks such as knowledge graphs.NER methods can generally be categorized into two types:word-level information and segment-level information.Some studies have shown that the fusion of the two types of information achieves better performance.However,the integration of word-level and segment-level information has not been thoroughly studied in Chinese medical NER task.Meanwhile,in the existing integration methods,each word in the segment is assigned with equivalent weight,which do not consider the different contribution of the word.Moreover,each word and entity(segment)in medical entities have different correlations,ignoring these differences in correlations will decrease the performance of medical NER.Based on this,we propose a word-level segment information extraction method called WL-SIE by analyzing the characteristics of Chinese medical entities.This method assigns a weight matrix set to each word in the entity to learn different associative information between words and entities,and outputs word-level segment information after interacting with entity phrases.Experimental results on the CCKS2017 and CMeEE Chinese clinical NER datasets dem-onstrate that the WL-SIE method improves the F1 score by 3% to 5% compared to comparative methods,particularly in scenarios with imbalanced entity samples and long entity recognition tasks,showing outstanding performance.

外文关键词：

named entity recognitiondeep neural networkword-level informationsegment-level informationChinese medical information processing

作者：

王海鹏、杜方、宋丽娟、李婷

展开 >

作者单位：

宁夏大学信息工程学院,宁夏银川 750021

宁夏大学数学统计学院,宁夏银川 750021

关键词：

命名实体识别深度神经网络词级信息段级信息中文医疗信息处理

基金：

国家自然科学基金宁夏回族自治区自然科学基金宁夏回族自治区自然科学基金宁夏回族自治区重点研发计划宁夏回族自治区重点研发计划

项目编号：

620620582021AAC031182021AAC030222019BEB040232021BEE03013

出版年：

2024

DOI：

10.20165/j.cnki.ISSN1673-629X.2024.0091

计算机技术与发展

陕西省计算机学会

计算机技术与发展

CSTPCD

影响因子：0.621

ISSN：1673-629X

年,卷(期)：2024.34(6)

参考文献量4