Investigation of mechanism of liver metastasis in colorectal cancer and its potential biomarkers based on WGCNA and machine learning algorithms
Objective To explore the molecular mechanisms and its potential biomarkers of colorectal cancer with liver metastasis(CRCLM)based on weighted gene co-expression network analysis(WGCNA)and machine learning algorithms.Methods Two microarray datasets of CRCLM(GSE6988 and GSE14297)were collected from GEO database.After identif-ying the differentially expressed genes(DEGs)in CRCLM,gene ontology(GO)analysis,Kyoto encyclopedia of genes and genomes(KEGG)enrichment analysis,and gene set enrichment analysis(GSEA)were performed.WGCNA was employed to select genes within modules with the strongest correlation with CRCLM.Machine learning algorithms,including least abso-lute shrinkage and selection operator(LASSO)logistic regression and support vector machine-recurive feature elimination(SVM-RFE),were used to identify potential biomarkers of CRCLM.The expression levels of key genes between the CRCLM group and the control group in GSE6988 were compared.At the same time,receiver operating characteristic(ROC)curves for the key genes diagnosis of CRCLM was drawn,and their diagnostic efficacy was assessed through the area under the curve(AUC),and validation was conducted using the GSE14297 dataset.Results A total of 73 DEGs were identified,including 55 upregulated genes and 18 downregulated genes.Biological function enrichment analysis revealed that DEGs were mainly enriched in pathways related to blood particles and chemokines.WGCNA obtained 5 gene co-expression modules,among which the yellow module showed the strongest correlation with CRCLM(cor=0.72,P=2e-14),containing a total of 81 genes.For the genes in the yellow module,LASSO logistic regression analysis identified 4 genes(CCL11,SLC26A3,NR4A2,and PLA2G2A)as potential diagnostic biomarkers.Through SVM-RFE algorithm,19 genes(CRP,HP,ORM2,CYP2E1,CCL11,MMP10,AQP3,SERPINA3,ENO3,HAO1,PLG,ENAM,DGUOK,UBE2Q2,HPX,APOA2,ITIH3,ANGPTL3,and MMP1)were obtained from DEGs as potential diagnostic genes.The key genes obtained from LAS-SO algorithm and SVM-RFE algorithm were intersected.Ultimately,CCL11(eotaxin)was identified as a promising biomark-er.In both training and validation sets,the expression of CCL11 in the CRCLM group was significantly lower than that in the control group(P<0.001).ROC curve analysis in the training and validation sets showed that the AUCs for diagnosing CRCLM with CCL11 were 0.936 and 0.997,respectively,demonstrating strong predictive ability for prognosis.Conclusion CCL11 is downregulated in CRCLM and may serve as a suppressor in CRCLM,suggesting its potential as a prognostic bio-marker.The occurrence and development of CRCLM may be associated with pathways related to blood microenvironment and chemokines.
colorectal cancer with liver metastasis(CRCLM)weighted gene co-expression network analysis(WGCNA)machine learning algorithmbioinformaticsCCL11