首页|基于机器学习的胃癌关键基因筛选及预测模型构建

基于机器学习的胃癌关键基因筛选及预测模型构建

扫码查看
目的:为了验证与胃癌相关的遗传特征,提出一种混合式特征选择方法确定靶基因,进一步分析其意义并建立新的诊断预测模型。方法:对原始胃癌数据进行生物信息学方差分析,使用随机森林、支持向量机的递归特征消除、套索算法等机器学习方法筛选胃癌相关基因,对结果取交集,获得关键基因集。进行富集分析,确定关键基因并验证;依据关键基因构建基于多层感知器(MLP)、逻辑回归、决策树等8种机器学习分类算法的诊断预测模型。结果:混合式的特征选择方法筛选出的关键基因与肿瘤发生和发展的生物学过程密切相关;8个关键基因(TXNDC5、BMP8A、ONECUT2、COL10A1、JCHAIN、INHBA、LCTL和TRIM59)被确定为诊断效果较好的胃癌潜在标志物;根据8种分类模型的ROC曲线和准确率结果可知,MLP为最佳胃癌预测模型,其准确率高达97。77%,比他人构建的Xgboost胃癌预测模型准确率高出3。83%。结论:本研究获得了诊断和预防胃癌的8个关键基因,并建立了最佳预后模型。
Key gene screening and prediction model construction of gastric cancer based on machine learning
Objective To verify the genetic characteristics associated with gastric cancer,and to propose a hybrid feature selection method for identifying target genes,further analyzing their significance and establishing a new diagnostic prediction model.Methods Analysis of variance in bioinformatics was performed on the original gastric cancer data,and then machine learning methods such as random forest,recursive feature elimination of support vector machine,and LASSO algorithm were used to screen gastric cancer associated genes,and the intersection of results was taken as the key gene set.The key genes were identified and verified through enrichment analysis.The diagnosis and prediction models based on 8 kinds of machine learning classification algorithms such as multi-layer perceptron,logistic regression and decision tree,were constructed using the key genes.Results The key genes selected by the hybrid feature selection method were closely related to the tumorigenesis and development.Eight key genes(TXNDC5,BMP8A,ONECUT2,COL10A1,JCHAIN,INHBA,LCTL and TRIM59)were identified as potential markers of good diagnostic efficacy in gastric cancer.The ROC curve and accuracy results demonstrated that among the 8 classification models,MLP is the best gastric cancer prediction model,with an accuracy of 97.77%,which was 3.83%higher than that of Xgboost gastric cancer prediction model.Conclusion The study identifies 8 key genes for the diagnosis and prevention of gastric cancer,and establishes the optimal prognosis model.

gastric cancergene screeningkey genebioinformaticsmachine learning

王泽朋、李坤鹏、周玉、李四海

展开 >

甘肃中医药大学信息工程学院,甘肃兰州 730100

胃癌 基因筛选 关键基因 生物信息学 机器学习

甘肃省科技计划甘肃省教育厅高等学校教师创新基金

21JR1RA2722023B-105

2024

中国医学物理学杂志
南方医科大学,中国医学物理学会

中国医学物理学杂志

CSTPCD
影响因子:0.483
ISSN:1005-202X
年,卷(期):2024.41(1)
  • 1
  • 33