首页|基于WGCNA和机器学习算法探索结直肠癌肝转移的机制及其潜在生物标志物

基于WGCNA和机器学习算法探索结直肠癌肝转移的机制及其潜在生物标志物

扫码查看
目的 通过基于加权基因共表达网络分析(WGCNA)和机器学习算法探索结直肠肝转移(CRCLM)潜在生物标志物,为CRCLM的分子机制研究提供基础.方法 从GEO数据库中收集两个CRCLM的微阵列数据集(GSE6988和GSE14297),鉴定出CRCLM中的差异表达基因(DEGs)后进行基因本体论(GO)分析、京都基因和基因组百科全书(KEGG)富集分析和基因集富集分析(GSEA).应用WGCNA筛选与CRCLM组相关性最强的模块内基因,采用机器学习算法最小绝对值收缩与筛选算子(LASSO)逻辑回归和支持向量机-递归特征消除(SVM-RFE)鉴定CRCLM的潜在生物标志物.比较GSE6988中CRCLM组和对照组的关键基因表达量,同时绘制关键基因诊断CRCLM的受试者工作特征(ROC)曲线,通过曲线下面积(AUC)评估其诊断效能,并在GSE14297中进行验证.结果 鉴定出73个DEGs,包括55个上调基因和18个下调基因.生物学功能富集分析表明,DEGs主要富集于血液微粒和趋化因子相关的通路.WGCNA共得到了 5个基因共表达模块,其中黄色模块与CRCLM相关性最强(cor=0.72,P=2e-14),其中包含81个基因.对黄色模块基因进行LASSO逻辑回归分析,其中4个基因(CCL11、SLC26A3、NR4A2、PLA2G2A)被确定为潜在的具有诊断性生物标志物,通过SVM-RFE算法,从DEGs中获得19个基因(CRP、HP、ORM2、CYP2E1、CCL11、MMP10、AQP3、SERPINA3、ENO3、HAO 1、PLG、ENAM、DGUOK、UBE2Q2、HPX、APOA2、ITIH3、ANGPTL3、MMP1)作为潜在的诊断基因,将LASSO算法以及SVM-RFE算法得到的关键基因取交集.最终嗜酸细胞活化趋化因子(CCL11)被确定为有希望的生物标志物.在训练集及验证集中,CRCLM组的CCL11表达均显著低于对照组(P<0.001).在训练集和验证集中的ROC曲线分析结果显示,CCL11诊断CRCLM的AUC分别为0.936和0.997,显示出很强的预测预后的能力.结论 CCL11在CRCLM中低表达,可能是CRCLM的抑制因素,是CRCLM可能的预后生物分子标志物.CRCLM的发生发展可能与肿瘤血管微环境及趋化因子相关通路相关.
Investigation of mechanism of liver metastasis in colorectal cancer and its potential biomarkers based on WGCNA and machine learning algorithms
Objective To explore the molecular mechanisms and its potential biomarkers of colorectal cancer with liver metastasis(CRCLM)based on weighted gene co-expression network analysis(WGCNA)and machine learning algorithms.Methods Two microarray datasets of CRCLM(GSE6988 and GSE14297)were collected from GEO database.After identif-ying the differentially expressed genes(DEGs)in CRCLM,gene ontology(GO)analysis,Kyoto encyclopedia of genes and genomes(KEGG)enrichment analysis,and gene set enrichment analysis(GSEA)were performed.WGCNA was employed to select genes within modules with the strongest correlation with CRCLM.Machine learning algorithms,including least abso-lute shrinkage and selection operator(LASSO)logistic regression and support vector machine-recurive feature elimination(SVM-RFE),were used to identify potential biomarkers of CRCLM.The expression levels of key genes between the CRCLM group and the control group in GSE6988 were compared.At the same time,receiver operating characteristic(ROC)curves for the key genes diagnosis of CRCLM was drawn,and their diagnostic efficacy was assessed through the area under the curve(AUC),and validation was conducted using the GSE14297 dataset.Results A total of 73 DEGs were identified,including 55 upregulated genes and 18 downregulated genes.Biological function enrichment analysis revealed that DEGs were mainly enriched in pathways related to blood particles and chemokines.WGCNA obtained 5 gene co-expression modules,among which the yellow module showed the strongest correlation with CRCLM(cor=0.72,P=2e-14),containing a total of 81 genes.For the genes in the yellow module,LASSO logistic regression analysis identified 4 genes(CCL11,SLC26A3,NR4A2,and PLA2G2A)as potential diagnostic biomarkers.Through SVM-RFE algorithm,19 genes(CRP,HP,ORM2,CYP2E1,CCL11,MMP10,AQP3,SERPINA3,ENO3,HAO1,PLG,ENAM,DGUOK,UBE2Q2,HPX,APOA2,ITIH3,ANGPTL3,and MMP1)were obtained from DEGs as potential diagnostic genes.The key genes obtained from LAS-SO algorithm and SVM-RFE algorithm were intersected.Ultimately,CCL11(eotaxin)was identified as a promising biomark-er.In both training and validation sets,the expression of CCL11 in the CRCLM group was significantly lower than that in the control group(P<0.001).ROC curve analysis in the training and validation sets showed that the AUCs for diagnosing CRCLM with CCL11 were 0.936 and 0.997,respectively,demonstrating strong predictive ability for prognosis.Conclusion CCL11 is downregulated in CRCLM and may serve as a suppressor in CRCLM,suggesting its potential as a prognostic bio-marker.The occurrence and development of CRCLM may be associated with pathways related to blood microenvironment and chemokines.

colorectal cancer with liver metastasis(CRCLM)weighted gene co-expression network analysis(WGCNA)machine learning algorithmbioinformaticsCCL11

张平茜、何亚玲、李宇阳、胡诗涵、高波、潘云

展开 >

大理大学基础医学院,云南大理 671000

大理大学临床医学院,云南大理 671000

大理大学第一附属医院病理科,云南大理 671000

结直肠癌肝转移 加权基因共表达网络分析 机器学习算法 生物信息学 嗜酸细胞活化趋化因子

国家自然科学基金国家自然科学基金

8216004481960042

2024

右江医学
右江民族医学院附属医院

右江医学

影响因子:0.779
ISSN:1003-1383
年,卷(期):2024.52(6)