Python Based Machine Learning to Investigate the Regularity of Decision Tree of the Pathway of Chinese Medicine in the treatment of Gastric Cancer
Objective Gastric cancer(GC)is malignancy characterized by high incidence and high mortality rate.There is a large amount of evidence of clinical Chinese herbal formulae for the treatment of GC,which indicates that traditional Chinese medicine(TCM)has remarkable efficacy in GC treatment.This study focused on the fundamental research of TCM formulae for treating GC,employing established research methodologies and approaches from other disciplines to explore the regularities of TCM across various dimensions.Methods The treatment of gastric cancer(GC)up to March 30,2023 in the literature was reviewed using the Chinese National Knowledge Infrastructure(CNKI)database and the Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform(TCMSP,http://tcmspw.com/tcmsp.php),the BATMAN-TCM database(http://bionet.ncpsb.org.cn/batman-tcm/index.php/),and the SymMap database(http://www.sym-map.org).A decision tree model was constructed by using Python to analyze clinically evidenced GC formulations identi-fied through literature statistics.The drugs with high frequency were studied at the molecular level to form a"TCM-treat-ment pathway"matrix,and then the"TCM-treatment pathway"matrix lacking clinical evidence was inputted into the de-cision tree model for learning and refinement.resulting in an association cohort of treatment pathways.Finally,the output of the decision tree model was validated using bioinformatics data analysis.Results(1)A total of 26 high-frequency gov-erning Chinese herbs for treating GC were identified,such as Atractylodes macrocephala,Astragalus membranaceus and Po-ria cocos.(2)A total of 225 pathways related to GC were found.(3)Using entropy calculation with the decision tree model built in Python,18 pathways with important effects on GC were identified from the 225 pathways enriched with proteins as-sociated with clinically evidenced TCMs for GC treatment.(4)The decision tree model calculated"Chemical carcinogene-sis-Tryptophan metabolism-PI3K-Akt signaling pathway-Antigen processing and presentation".(5)The performance of the 18 pathways varied across different labels.In the"No"label,TCMs exhibited relatively concentrated distributions of high values(depicted by dark areas),whereas in the"Yes"label,TCMs showed relatively scattered distributions of high values.(6)The clustering results of q values and p values suggested that"Fatty acid biosynthesis"and"Antigen process-ing and presentation"were associated with"1-Yes",while"Vibrio cholerae infection"and"Dilated cardiomyopathy"all point to"1-No"."Vibrio cholerae infection"and"Dilated cardiomyopathy"were associated with"1-No","Breast cancer"and"Protein digestion and absorption"were associated with"0-Yes".Conclusion The pathway cohort"Chemical carcinogenesis-Tryptophan metabolism-PI3K-Akt signaling pathway-Antigen processing and presenta-tion"represents a correlated sequence within the TCM treatment pathways for GC.The high degree of similarity between the results obtained from bioinformatics analysis and the machine learning decision tree model built in Python confirms the accuracy of the decision tree algorithm in determining the therapeutic efficacy of TCMs for GC.Additionally,utilizing ma-ture computational models built in Python enables the investigation of the regularities of clinical TCM formulations in dis-ease treatment from various perspectives.This represents a new attempt in clinical TCM formulation research,combining multidisciplinary and multi-angle analyses,which significantly contributes to the modernization of TCM research and holds crucial implications for human health development.
PythonMachine LearningGastric CancerDecision TreeClinical Traditional Chinese Medicine For-mulaTherapeutic Pathway