首页|面向生物医学命名实体识别和规范化的多粒度特征融合

面向生物医学命名实体识别和规范化的多粒度特征融合

扫码查看
为了从生物医学文献中提取丰富的实体信息及其规范化表达,提出了一种面向生物医学命名实体和规范化的多粒度特征融合方法(multi-granularity feature fusion approach for biomedical named entity recognition and normalization,MGFFA).通过整合字符级、词级、概念级的文本信息,显著增强了模型的学习能力.同时还包含一个用于存储和综合不同层次信息的记忆库,以实现对实体及其规范化标签间复杂关系的深入理解.通过预训练模型的配合使用,MGFFA不仅捕捉了文本的粗粒度语义表示,还细致分析了构词层面的特征,从而全面提升了对长跨度实体的识别准确率.在NCBI和NC5CDR数据集上的实验结果显示,该模型在总体上优于其他基线模型.
Multi-granularity Feature Fusion for Biomedical Named Entity Recognition and Normalization
To extract rich entity information and normalized expressions from biomedical literature,this study proposes a multi-granularity feature fusion approach for biomedical named entity recognition and normalization(MGFFA).By integrating character-level,word-level,and concept-level textual information,the model significantly enhances its learning capability.It also incorporates a memory bank for storing and synthesizing information from different levels to achieve a deeper understanding of the complex relationships between entities and their normalized labels.With the integration of pre-trained models,MGFFA captures not only coarse-grained semantic representations of text but also conducts detailed analysis at the morphological level,thereby comprehensively improving the recognition accuracy of long-span entities.Experimental results on the NCBI and NC5CDR datasets demonstrate that the model outperforms other baseline models overall.

biomedical named entity recognitionbiomedical named entity normalizationmulti-task learningmemory network

刘彤、石昌岭、倪维健

展开 >

山东科技大学 计算机科学与工程学院,青岛 266590

生物医学命名实体识别 生物医学命名实体规范化 多任务学习 记忆网络

科技创新2030—"新一代人工智能"重大项目山东省自然科学基金山东科技大学青年教师教学拔尖人才培养基金

2022ZD0119500ZR2022MF319BJ20211110

2024

计算机系统应用
中国科学院软件研究所

计算机系统应用

CSTPCD
影响因子:0.449
ISSN:1003-3254
年,卷(期):2024.33(11)