为了从生物医学文献中提取丰富的实体信息及其规范化表达,提出了一种面向生物医学命名实体和规范化的多粒度特征融合方法(multi-granularity feature fusion approach for biomedical named entity recognition and normalization,MGFFA).通过整合字符级、词级、概念级的文本信息,显著增强了模型的学习能力.同时还包含一个用于存储和综合不同层次信息的记忆库,以实现对实体及其规范化标签间复杂关系的深入理解.通过预训练模型的配合使用,MGFFA不仅捕捉了文本的粗粒度语义表示,还细致分析了构词层面的特征,从而全面提升了对长跨度实体的识别准确率.在NCBI和NC5CDR数据集上的实验结果显示,该模型在总体上优于其他基线模型.
Multi-granularity Feature Fusion for Biomedical Named Entity Recognition and Normalization
To extract rich entity information and normalized expressions from biomedical literature,this study proposes a multi-granularity feature fusion approach for biomedical named entity recognition and normalization(MGFFA).By integrating character-level,word-level,and concept-level textual information,the model significantly enhances its learning capability.It also incorporates a memory bank for storing and synthesizing information from different levels to achieve a deeper understanding of the complex relationships between entities and their normalized labels.With the integration of pre-trained models,MGFFA captures not only coarse-grained semantic representations of text but also conducts detailed analysis at the morphological level,thereby comprehensively improving the recognition accuracy of long-span entities.Experimental results on the NCBI and NC5CDR datasets demonstrate that the model outperforms other baseline models overall.
biomedical named entity recognitionbiomedical named entity normalizationmulti-task learningmemory network