Identification of key genes for ruptured abdominal aortic aneurysm based on bioinformatics and machine learning
Objective To screen key genes for ruptured abdominal aortic aneurysm(rAAA)using machine learning methods.Methods Firstly,the GSE98278 dataset was downloaded from the Gene Expression Omnibus(GEO)database and divided into training and validation sets at a ratio of 7:3.The maximum relevance minimum redundancy(mRMR)algorithm was used to select 500 feature genes.Subse-quently,least absolute shrinkage and selection operator(LASSO),support vector machine recursive feature elimination(SVM-RFE),and random forest(RF)algorithms were applied to further refine these features,and key genes were identified at their intersection.The diagnostic performance of the key genes for rAAA was evaluated in the training and validation sets using receiver operating characteristic(ROC)curves.Differential expression analysis was then conducted,and a protein-protein interaction(PPI)network was constructed.Node genes related to key genes(with scores>0.02)were identified through random walk with restart analysis(RWR),followed by Gene Ontology(GO)and Kyoto Encyclopedia of Genes and Genomes(KEGG)analysis.Results The LASSO,SVM-RFE,and RF methods identified 16,10,and 34 feature genes,respectively.By taking the intersection,a total of 5 key genes were identified in this study:CX3C chemokine receptor 1(CX3CR1),F2R like thrombin or trypsin receptor 3(F2R13),CC chemokine ligand 8(CCL8),2'-5'-oligoadenylate synthetase 3(OAS3),and metallothionein 1X(MT1X).ROC curve analysis showed that in the training set,the area under the curve(AUC)values for the five genes in diagnosing rAAA were 0.94,0.97,0.94,0.95,and 0.95,respectively.In the valida-tion set,the AUC values were 1.00,0.90,0.78,0.90,and 0.90,respectively.RWR analysis identified 28 node genes.GO enrichment analysis indicated that node genes were enriched in processes such as metal ion absorption and interactions between viral proteins and cytokines and cytokine receptors.KEGG enrich-ment analysis revealed that node genes were mainly enriched in pathways related to viral response and cel-lular metal ion response.Conclusion The key genes identified using machine learning methods may be potential diagnostic biomarkers for rAAA.