面向生物医学命名实体识别和规范化的多粒度特征融合

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：为了从生物医学文献中提取丰富的实体信息及其规范化表达,提出了一种面向生物医学命名实体和规范化的多粒度特征融合方法(multi-granularity feature fusion approach for biomedical named entity recognition and normalization,MGFFA).通过整合字符级、词级、概念级的文本信息,显著增强了模型的学习能力.同时还包含一个用于存储和综合不同层次信息的记忆库,以实现对实体及其规范化标签间复杂关系的深入理解.通过预训练模型的配合使用,MGFFA不仅捕捉了文本的粗粒度语义表示,还细致分析了构词层面的特征,从而全面提升了对长跨度实体的识别准确率.在NCBI和NC5CDR数据集上的实验结果显示,该模型在总体上优于其他基线模型.

外文标题：Multi-granularity Feature Fusion for Biomedical Named Entity Recognition and Normalization

外文摘要：To extract rich entity information and normalized expressions from biomedical literature,this study proposes a multi-granularity feature fusion approach for biomedical named entity recognition and normalization(MGFFA).By integrating character-level,word-level,and concept-level textual information,the model significantly enhances its learning capability.It also incorporates a memory bank for storing and synthesizing information from different levels to achieve a deeper understanding of the complex relationships between entities and their normalized labels.With the integration of pre-trained models,MGFFA captures not only coarse-grained semantic representations of text but also conducts detailed analysis at the morphological level,thereby comprehensively improving the recognition accuracy of long-span entities.Experimental results on the NCBI and NC5CDR datasets demonstrate that the model outperforms other baseline models overall.

外文关键词：

biomedical named entity recognitionbiomedical named entity normalizationmulti-task learningmemory network

作者：

刘彤、石昌岭、倪维健

展开 >

作者单位：

山东科技大学计算机科学与工程学院,青岛 266590

关键词：

生物医学命名实体识别生物医学命名实体规范化多任务学习记忆网络

基金：

科技创新2030—"新一代人工智能"重大项目山东省自然科学基金山东科技大学青年教师教学拔尖人才培养基金

项目编号：

2022ZD0119500ZR2022MF319BJ20211110

出版年：

2024

DOI：

10.15888/j.cnki.csa.009640

计算机系统应用

中国科学院软件研究所

计算机系统应用

CSTPCD

影响因子：0.449

ISSN：1003-3254

年,卷(期)：2024.33(11)