首页|基于生物信息学和机器学习筛选破裂腹主动脉瘤的关键基因

基于生物信息学和机器学习筛选破裂腹主动脉瘤的关键基因

扫码查看
目的 应用机器学习的方法筛选破裂腹主动脉瘤(rAAA)的关键基因.方法 首先从基因表达谱(GEO)数据库中下载GSE98278数据集,根据7:3将数据集分成训练集及验证集.使用最大相关最小冗余(mRMR)算法筛选500个特征基因,之后分别使用最小绝对收缩和选择算子(LASSO)、支持向量机递归特征消除(SVM-RFE)以及随机森林(RF)算法筛选特征基因,并取交集得到关键基因,使用受试者工作特征(ROC)曲线在测试集和验证集中评价关键基因对rAAA的诊断效能.之后进行差异分析并构建蛋白质-蛋白质相互作用(PPI)网络,通过重启随机游走分析(RWR)筛选与关键基因相关(得分>0.02)的节点基因,并进行基因本体论(GO)与京都基因与基因组百科全书(KEGG)分析.结果 LASSO、SVM-RFE、RF方法分别筛选出16、10、34个特征基因.通过取交集本研究共筛选出5个关键基因,分别是CX3C趋化因子受体1(CX3CR1)、F2R样凝血酶或胰蛋白酶受体3(F2RL3)、CC趋化因子配体8(CCL8)、2'-5'-寡腺苷酸合成酶3(OAS3)、金属硫蛋白1X(MT1X).ROC曲线分析结果显示,在训练集中,5个基因诊断rAAA的曲线下面积(AUC)分别为 0.94、0.97、0.94、0.95、0.95;在验证集中,AUC 分别为 1.00、0.90、0.78、0.90、0.90.RWR 分析共得到27个节点基因,GO富集分析结果表明,节点基因富集于矿物质吸收、病毒蛋白与细胞因子及细胞因子受体互作等过程,KEGG富集分析结果表明,节点基因主要富集于病毒反应、细胞金属离子反应等通路.结论 通过机器学习方法获得的关键基因可能是rAAA潜在的诊断生物标志物.
Identification of key genes for ruptured abdominal aortic aneurysm based on bioinformatics and machine learning
Objective To screen key genes for ruptured abdominal aortic aneurysm(rAAA)using machine learning methods.Methods Firstly,the GSE98278 dataset was downloaded from the Gene Expression Omnibus(GEO)database and divided into training and validation sets at a ratio of 7:3.The maximum relevance minimum redundancy(mRMR)algorithm was used to select 500 feature genes.Subse-quently,least absolute shrinkage and selection operator(LASSO),support vector machine recursive feature elimination(SVM-RFE),and random forest(RF)algorithms were applied to further refine these features,and key genes were identified at their intersection.The diagnostic performance of the key genes for rAAA was evaluated in the training and validation sets using receiver operating characteristic(ROC)curves.Differential expression analysis was then conducted,and a protein-protein interaction(PPI)network was constructed.Node genes related to key genes(with scores>0.02)were identified through random walk with restart analysis(RWR),followed by Gene Ontology(GO)and Kyoto Encyclopedia of Genes and Genomes(KEGG)analysis.Results The LASSO,SVM-RFE,and RF methods identified 16,10,and 34 feature genes,respectively.By taking the intersection,a total of 5 key genes were identified in this study:CX3C chemokine receptor 1(CX3CR1),F2R like thrombin or trypsin receptor 3(F2R13),CC chemokine ligand 8(CCL8),2'-5'-oligoadenylate synthetase 3(OAS3),and metallothionein 1X(MT1X).ROC curve analysis showed that in the training set,the area under the curve(AUC)values for the five genes in diagnosing rAAA were 0.94,0.97,0.94,0.95,and 0.95,respectively.In the valida-tion set,the AUC values were 1.00,0.90,0.78,0.90,and 0.90,respectively.RWR analysis identified 28 node genes.GO enrichment analysis indicated that node genes were enriched in processes such as metal ion absorption and interactions between viral proteins and cytokines and cytokine receptors.KEGG enrich-ment analysis revealed that node genes were mainly enriched in pathways related to viral response and cel-lular metal ion response.Conclusion The key genes identified using machine learning methods may be potential diagnostic biomarkers for rAAA.

Abdominal aortic aneurysmMachine learningBioinformatics

刘仕睿、焦周阳、程帅、单金涛、夏磊、化召辉、李震

展开 >

郑州大学第一附属医院腔内血管外科,郑州 450052

腹主动脉瘤 机器学习 生物信息学

2024

中华实验外科杂志
中华医学会

中华实验外科杂志

CSTPCD
影响因子:0.759
ISSN:1001-9030
年,卷(期):2024.41(11)