中国卫生信息管理杂志2024,Vol.21Issue(6) :902-908,914.DOI:10.3969/j.issn.1672-5166.2024.06.018

基于大语言模型的小样本医学命名实体识别方法研究

Research on Large Language Model-based Few-shot Medical Named Entity Recognition

赵从朴 朱卫国 赵飞 郭安辉
中国卫生信息管理杂志2024,Vol.21Issue(6) :902-908,914.DOI:10.3969/j.issn.1672-5166.2024.06.018

基于大语言模型的小样本医学命名实体识别方法研究

Research on Large Language Model-based Few-shot Medical Named Entity Recognition

赵从朴 1朱卫国 1赵飞 2郭安辉1
扫码查看

作者信息

  • 1. 中国医学科学院北京协和医院,北京市,100730
  • 2. 国家卫生健康委统计信息中心,北京市,100810
  • 折叠

摘要

目的 利用大语言模型实现小样本医学命名实体识别.方法 将医学命名实体识别任务转换为文本生成任务,构造医学命名实体识别特定的提示模板;利用大语言模型在文本生成的过程中生成医学实体的标签序列,从医学文本语料中检索少量相似标注数据作为示例,结合语境学习,从而实现小样本场景下的医学命名实体识别.结果 实验结果显示,采用本方法准确率、召回率和F1值分别达到了50.54%、47.12%和48.77%,均显著优于传统的机器学习算法和深度学习算法;合理使用多条样本作为示例可以进一步提升模型预测性能.结论 本文提出的方法不仅不需要对模型进行参数更新,而且几乎不依赖于数据标注,提升了方法的泛化能力.

Abstract

Objective To use a large language model to achieve small sample medical named entity recognition. Methods Convert the medical named entity recognition task into a text generation task,construct a specific prompt template for medical named entity recognition,and enable the large language model to generate a sequence of medical entity labels during the text generation process. Retrieve a small amount of similar labeled data from medical text corpus as an example,combined with contextual learning,to achieve medical named entity recognition in small sample scenarios. Results The proposed method in this paper achieved accuracy,recall,and F1 scores of 50.54%,47.12%,and 48.77%,respectively,all of which are significantly higher than those obtained by traditional machine learning algorithms and deep learning algorithms. The reasonable use of multiple samples as examples can further enhance the model's predictive performance. Conclusion The method proposed in this paper not only does not need to update the parameters of the model,but also almost does not rely on data annotation,which improves the generalization ability of the method.

关键词

大语言模型/医学命名实体识别/小样本

Key words

large language model/medical named entity recognition/few-shot

引用本文复制引用

出版年

2024
中国卫生信息管理杂志
卫生部统计信息中心

中国卫生信息管理杂志

CSTPCD
影响因子:1.2
ISSN:1672-5166
段落导航相关论文