基于大语言模型的小样本医学命名实体识别方法研究

扫码查看

原文链接

万方数据
维普

中文摘要：目的利用大语言模型实现小样本医学命名实体识别.方法将医学命名实体识别任务转换为文本生成任务,构造医学命名实体识别特定的提示模板;利用大语言模型在文本生成的过程中生成医学实体的标签序列,从医学文本语料中检索少量相似标注数据作为示例,结合语境学习,从而实现小样本场景下的医学命名实体识别.结果实验结果显示,采用本方法准确率、召回率和F1值分别达到了50.54％、47.12％和48.77％,均显著优于传统的机器学习算法和深度学习算法;合理使用多条样本作为示例可以进一步提升模型预测性能.结论本文提出的方法不仅不需要对模型进行参数更新,而且几乎不依赖于数据标注,提升了方法的泛化能力.

外文标题：Research on Large Language Model-based Few-shot Medical Named Entity Recognition

外文摘要：Objective To use a large language model to achieve small sample medical named entity recognition. Methods Convert the medical named entity recognition task into a text generation task,construct a specific prompt template for medical named entity recognition,and enable the large language model to generate a sequence of medical entity labels during the text generation process. Retrieve a small amount of similar labeled data from medical text corpus as an example,combined with contextual learning,to achieve medical named entity recognition in small sample scenarios. Results The proposed method in this paper achieved accuracy,recall,and F1 scores of 50.54％,47.12％,and 48.77％,respectively,all of which are significantly higher than those obtained by traditional machine learning algorithms and deep learning algorithms. The reasonable use of multiple samples as examples can further enhance the model's predictive performance. Conclusion The method proposed in this paper not only does not need to update the parameters of the model,but also almost does not rely on data annotation,which improves the generalization ability of the method.

外文关键词：

large language modelmedical named entity recognitionfew-shot

作者：

赵从朴、朱卫国、赵飞、郭安辉

展开 >

作者单位：

中国医学科学院北京协和医院,北京市,100730

国家卫生健康委统计信息中心,北京市,100810

关键词：

大语言模型医学命名实体识别小样本

出版年：

2024

DOI：

10.3969/j.issn.1672-5166.2024.06.018

中国卫生信息管理杂志

卫生部统计信息中心

中国卫生信息管理杂志

CSTPCD

影响因子：1.2

ISSN：1672-5166

年,卷(期)：2024.21(6)