目的 对类风湿关节炎(rheumatoid arthritis,RA)的基因数据集进行生物信息学分析及机器学习,筛选出相关潜在的诊断及治疗靶点的关键基因.方法 通过获取RA相关数据集,筛选差异表达基因(differentially expressed genes,DEGs).通过最小绝对收缩与选择算子(least absolute shrinkage and selection operator,LASSO)和支持向量机递归特征消除(multiple support vector machine recursive feature elimination,mSVM-RFE)两种机器学习算法筛选关键基因,并绘制受试者工作特征(receiver operating characteristic,ROC)曲线以评价关键基因作为诊断及治疗靶点的潜在价值.结果 两个数据库筛选得到377个DEGs,其中上调基因266个,下调基因111个.通过两种机器学习算法筛选得到6个关键基因:HCP5、LRRC15、MREG、SDC1、SLC26A10和SNX10.ROC曲线分析显示,训练集中上述6个关键基因诊断 RA 的曲线下面积(area under the curve,AUC)依次为 0.959、0.945、0.878、0.929、0.882、0.903,均大于 0.8,验证集中上述6个关键基因AUC依次为0.821、0.912、0.971、0.997、0.671、0.894,除SLC26A10基因外均大于0.8,说明HCP5、LRRC15、MREG、SDC1、SLC26A10和SNX10 6个关键基因均对RA具有较高诊断价值.结论 通过生物信息学及机器学习方法分析获得的关键基因可能是RA潜在诊断标志物及精准治疗靶点.
Identification of Key Genes in Rheumatoid Arthritis Based on Bioinformatics and Machine Learning
Objective To bioinformatics analyze and machine learn the genetic datasets of rheumatoid arthritis(RA),and screen out the potential key genes related to diagnosis and therapeutic targets.Methods The RA related data sets were obtained to screen differentially expressed genes(DEGs).Least absolute shrinkage and selection operator(LASSO)and multiple support vector machine recursive feature elimination(mSVM-RFE)were applied to screen key genes,and the receiver operating characteristic(ROC)curve of key genes was drawn to evaluate the potential value of key genes as diagnostic and therapeutic targets.Results A total of 377 DEGs were screened from the two databases,inclu-ding 266 up-regulated genes and 111 down-regulated genes.Six key genes were identified by two machine learning algo-rithms:HCP5,LRRC15,MREG,SDC1,SLC26A10 and SNX10.ROC curve analysis showed that area under the curve(AUC)of the six key genes above for RA diagnosis in the training set were 0.959,0.945,0.878,0.929,0.882,0.903,all above 0.8.The AUC of the six key genes above in the validation set were 0.821,0.912,0.971,0.997,0.671 and 0.894 respectively,which were all greater than 0.8 except for SLC26A10 gene,indicating that all the six key genes above had high diagnostic value for RA.Conclusion The key genes obtained by bioinformatics analysis and machine learning al-gorithms may be potential diagnostic markers and precision treatment targets for RA.