首页|基于Python构建机器学习探究中药治疗胃癌通路决策树的规律性

基于Python构建机器学习探究中药治疗胃癌通路决策树的规律性

扫码查看
目的 胃癌(Gastric cancer,GC)是高发率和致死率"双高"的癌症,临床中药复方治疗GC有大量的证据,表明中医药在GC的治疗方面有着出色的表现.文中针对治疗GC的临床中药组方开展基础性研究,利用其他学科较为成熟的研究思路和研究方式,探究中医药在不同维度层面的规律性.方法 筛选中国知网数据库及中药系统药理学数据库与分析平台(TCMSP)(http://tcmspw.com/tcmsp.php)、BATMAN-TCM 数据库(http://bionet.ncpsb.org.cn/batman-tcm/index.php/)、SymMap 数据库(http://www.symmap.org)建库至 2023 年 3 月 30 日期间治疗GC的文献,利用Python构建决策树模型,通过文献统计有临床治疗证据的GC复方,对高频出现的药物进行分子层面的研究,形成"中药-治疗通路"矩阵,再将无临床证据的"中药-治疗通路"矩阵送入决策树模型进行学习修正,获得治疗通路的关联队列,并通过生物信息学数据分析验证决策树模型的输出结果.结果 (1)筛选出GC高频治理中药共计26味,如白术、黄芪和茯苓等;(2)找到与GC有关的通路共计225条;(3)Python构建的决策树模型进行熵值计算,在225条有临床证据的中药蛋白富集的通路中寻找到的18条对GC有重要作用的通路;(4)决策树模型计算出"Chemical carcinogenesis-Tryptophan metabolism-PI3K-Akt signaling pathway-Anti-gen processing and presentation"关联队列;(5)18条通路在对不同标签中药出现不同的表现,"No"标签中药的高值(深色区域)相对集中分布,而"Yes"标签中药的高值(深色区域)相对散在分布;(6)q值和p值聚类结果提示"Fatty acid biosynthesis"和"Antigen processing and presentation"均指向"1-Yes""Vibrio cholerae infection"和"Dilat-ed cardiomyopathy"均指向"1-No""Breast cancer"和"Protein digestion and absorption"均指向"0-Yes".结论 "Chemical carcinogenesis-Tryptophan metabolism-PI3 K-Akt signaling pathway-Antigen processing and presenta-tion"通路队列,是GC中药治疗通路中的关联队列.生物信息学结果与Python构建机器学习决策树模型结果在判别是否具有GC治疗作用上有着高度相似性,从生物信息学层面印证了决策树算法模型结果的准确性.同时,利用Python构建成熟的计算机模型可以从不同的角度去研究临床复方中药在疾病治疗中的规律性,是临床中药组方研究的一种新的尝试,结合多学科多角度的分析,对中医药的现代研究有积极的推动作用,并且对人类健康发展有着重要意义.
Python Based Machine Learning to Investigate the Regularity of Decision Tree of the Pathway of Chinese Medicine in the treatment of Gastric Cancer
Objective Gastric cancer(GC)is malignancy characterized by high incidence and high mortality rate.There is a large amount of evidence of clinical Chinese herbal formulae for the treatment of GC,which indicates that traditional Chinese medicine(TCM)has remarkable efficacy in GC treatment.This study focused on the fundamental research of TCM formulae for treating GC,employing established research methodologies and approaches from other disciplines to explore the regularities of TCM across various dimensions.Methods The treatment of gastric cancer(GC)up to March 30,2023 in the literature was reviewed using the Chinese National Knowledge Infrastructure(CNKI)database and the Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform(TCMSP,http://tcmspw.com/tcmsp.php),the BATMAN-TCM database(http://bionet.ncpsb.org.cn/batman-tcm/index.php/),and the SymMap database(http://www.sym-map.org).A decision tree model was constructed by using Python to analyze clinically evidenced GC formulations identi-fied through literature statistics.The drugs with high frequency were studied at the molecular level to form a"TCM-treat-ment pathway"matrix,and then the"TCM-treatment pathway"matrix lacking clinical evidence was inputted into the de-cision tree model for learning and refinement.resulting in an association cohort of treatment pathways.Finally,the output of the decision tree model was validated using bioinformatics data analysis.Results(1)A total of 26 high-frequency gov-erning Chinese herbs for treating GC were identified,such as Atractylodes macrocephala,Astragalus membranaceus and Po-ria cocos.(2)A total of 225 pathways related to GC were found.(3)Using entropy calculation with the decision tree model built in Python,18 pathways with important effects on GC were identified from the 225 pathways enriched with proteins as-sociated with clinically evidenced TCMs for GC treatment.(4)The decision tree model calculated"Chemical carcinogene-sis-Tryptophan metabolism-PI3K-Akt signaling pathway-Antigen processing and presentation".(5)The performance of the 18 pathways varied across different labels.In the"No"label,TCMs exhibited relatively concentrated distributions of high values(depicted by dark areas),whereas in the"Yes"label,TCMs showed relatively scattered distributions of high values.(6)The clustering results of q values and p values suggested that"Fatty acid biosynthesis"and"Antigen process-ing and presentation"were associated with"1-Yes",while"Vibrio cholerae infection"and"Dilated cardiomyopathy"all point to"1-No"."Vibrio cholerae infection"and"Dilated cardiomyopathy"were associated with"1-No","Breast cancer"and"Protein digestion and absorption"were associated with"0-Yes".Conclusion The pathway cohort"Chemical carcinogenesis-Tryptophan metabolism-PI3K-Akt signaling pathway-Antigen processing and presenta-tion"represents a correlated sequence within the TCM treatment pathways for GC.The high degree of similarity between the results obtained from bioinformatics analysis and the machine learning decision tree model built in Python confirms the accuracy of the decision tree algorithm in determining the therapeutic efficacy of TCMs for GC.Additionally,utilizing ma-ture computational models built in Python enables the investigation of the regularities of clinical TCM formulations in dis-ease treatment from various perspectives.This represents a new attempt in clinical TCM formulation research,combining multidisciplinary and multi-angle analyses,which significantly contributes to the modernization of TCM research and holds crucial implications for human health development.

PythonMachine LearningGastric CancerDecision TreeClinical Traditional Chinese Medicine For-mulaTherapeutic Pathway

宋健、孟凯强、沈舒文、雷根平、韦永红、石少楠、惠建萍、王捷虹、许鹏、张云

展开 >

陕西中医药大学,陕西咸阳 712046

陕西省中医医院,陕西西安 710003

西安市第九医院,陕西西安 710054

Python 机器学习 胃癌 决策树 临床中药组方 治疗通路

全国名老中医药专家传承工作室建设项目第六批全国老中医药专家学术经验继承工作项目陕西省科技厅社会发展科技攻关项目陕西省教育厅专项科学研究计划项目陕西省高校黄大年式教师团队

国中医药人教发[2016]42号国中医药人教发[2017]29号2016SF-40318JK0225陕教函[2023]668号

2024

世界中西医结合杂志
中华中医药学会

世界中西医结合杂志

CSTPCD
影响因子:1.053
ISSN:1673-6613
年,卷(期):2024.19(2)
  • 37