基于大语言模型的中医医案命名实体抽取研究
Study on Named Entity Extraction in TCM Medical Records Based on Large Language Models
李盼飞 1杨小康 2白逸晨 1李海燕1
作者信息
- 1. 中国中医科学院中医药信息研究所,北京 100700
- 2. 北京中医药大学中药学院,北京 100029
- 折叠
摘要
人工智能时代赋予了海量中医医案更高的学术价值,但医案文本不规范、命名实体种类繁多,严重阻碍了医案的深入研究.本研究在回顾中医医案格式演变历程、分析医案结构要素、构建医案信息模型的基础上,研制了基于大语言模型医案实体抽取的提示词,探索基于大语言模型的医案命名实体的自动化抽取过程,最终开发出医案文本结构化工具.本研究为中医医案结构化研究、大规模中医医案科学数据抽取探索了可行路径,为基于中医医案的人工智能研究提供数据基础.
Abstract
The era of artificial intelligence has bestowed greater academic value upon a vast amount of TCM medical records.However,the non-standardization of medical record texts and the multitude of named entity types present significant obstacles to in-depth research on TCM medical records.Based on a review of the evolution of TCM medical record formats,analysis of structural elements in medical records,and the construction of a medical record information model,this study developed prompts for named entity extraction in medical records using large language models,and explored the automated extraction process of named entities in medical records based on large language models and ultimately developed a tool for structuring medical record texts.The study also explored feasible paths for the structured analysis of TCM medical records and the extraction of scientific data from large-scale TCM medical records,with the purpose to establish a data foundation for artificial intelligence research based on TCM medical records.
关键词
中医医案/大语言模型/命名实体抽取/医案信息模型/人工智能Key words
TCM medical records/large language models/named entity extraction/medical record information model/artificial intelligence引用本文复制引用
基金项目
中国博士后科学基金面上项目(2023M743920)
中国中医科学院科技创新工程(CI2021B002)
中国中医科学院基本科研业务费自主选题项目(ZZ160315)
出版年
2024