首页|基于方剂数据集的知识图谱构建研究

基于方剂数据集的知识图谱构建研究

扫码查看
目的:构建基于方剂数据集的知识图谱,以系统性地展示方剂实体及其之间的关系.方法:首先建立方剂数据处理与知识图谱构建的规范化流程,获取方剂数据集,然后在4种常用命名实体识别模型中遴选最优模型进行实体抽取,最后利用Neo4j图数据库构建知识图谱.结果:最终遴选出基于Transformer的双向编码模型-双向长短期记忆网络-条件随机场(BERT-BiLSTM-CRF)模型,从数据集中抽取出症状、中西医病名、中医证候等医学实体,平均F1值达90.55%,形成了规范的方剂数据集并构建了方剂知识图谱.结论:利用本文方法抽取出的医学实体为中医药的临床实践和科学研究提供了系统性展示方剂实体及其之间关系的可靠数据基础.所建立的方剂知识图谱实现了中药方剂的知识检索,不仅有助于发现方剂数据中的潜在知识与内在关系,而且为中医药领域的信息整合和知识发现提供了坚实基础,推动中医药的现代化进程.
Knowledge Mapping of TCM Formulas Based on Data Set
Objective:To build a knowledge map based on the data set of TCM formulas,so as to systematically display formula en-tities and their relationships.Methods:Firstly,the normalized process of formula data processing and knowledge mapping were es-tablished to obtain the data set of formulas,and then the optimal model was selected from four commonly used named entity recogni-tion models for entity extraction.Finally,the Neo4j graph database was used to build the knowledge map.Results:The bi-directional encoder representations from transformers-bi-directional long short-term memory-conditional random field(BERT-BiLSTM-CRF)model was finally selected to extract the medical entities such as symptoms,disease names of Chinese and Western medicines,and TCM syndromes from the data set,with an average F1 value of 90.55%.A normalized dataset and a knowledge map of TCM formu-las were established.Conclusion:The medical entities extracted by this method provide a data basis for the clinical practice and scientific research of TCM to systematically display formula entities and their relationships.The established knowledge map realizes the knowledge retrieval of TCM formulas,which not only helps to discover the potential knowledge and internal relations in formula data but also lays a solid foundation for information integration and knowledge discovery,thus promoting the modernization of TCM.

TCM formulaData processingKnowledge mapNormalizationNamed entity recognitionNeo4j graph databaseBERT-BiLSTM-CRF modelTCM

李灿、镇可涵、唐东昕、解丹

展开 >

湖北中医药大学信息工程学院,武汉,430065

贵州中医药大学第一附属医院,贵阳,550001

方剂 数据处理 知识图谱 规范化 命名实体识别 Neo4j图数据库 基于Transformer的双向编码模型-双向长短期记忆网络-条件随机场模型 中医药

国家重点研发计划"中医药现代化"重点专项广东省中医药信息化重点实验室开放基金项目

2019YFC17125042021502

2024

世界中医药
世界中医药学会联合会

世界中医药

CSTPCDCHSSCD北大核心
影响因子:1.266
ISSN:1673-7202
年,卷(期):2024.19(9)