Construction of a nasopharyngeal carcinoma diagnostic prediction model based on mRNA gene expression using dupport vector machine recursive feature elimination and artificial neural network algorithms
Construction of a nasopharyngeal carcinoma diagnostic prediction model based on mRNA gene expression using dupport vector machine recursive feature elimination and artificial neural network algorithms
王艺任 1王会 2向红俐 2罗颜 3杨录 3邓志涛 3庞皓文 4周平 3刘新艳
扫码查看
点击上方二维码区域,可以放大扫码查看
作者信息
1. 1西南医科大学护理学院,泸州 646000;2西南医科大学附属医院放射科,泸州 646000
2. 1西南医科大学护理学院,泸州 646000
3. 2西南医科大学附属医院放射科,泸州 646000
4. 3西南医科大学附属医院肿瘤科,泸州 646000
折叠
摘要
目的 通过支持向量机递归特征消除算法和人工神经网络算法筛选构建一种基于mRNA基因表达的鼻咽癌诊断预测模型,为临床早期筛查、干预以及分子机制的研究提供参考。 方法 从GEO、国际癌症基因组联盟(ICGC)和GTEx公共数据库中获取了鼻咽癌患者(n=216)和正常对照者(n=248)的微阵列与转录组测序基因表达谱数据。首先通过差异表达分析确定了与鼻咽癌相关的差异表达基因,再利用支持向量机递归特征消除算法筛选出重要的基因特征。最后利用人工神经网络算法构建了鼻咽癌诊断预测模型,并通过内部和外部验证集的分析评估了模型的准确性和预测性能。 结果 本研究共鉴定了457个差异表达基因。随后通过支持向量机递归特征消除算法筛选出了6个重要基因特征,分别为尿激酶型纤溶酶原激活物(PLAU)、SHISA3、基质金属蛋白酶1(MMP1)、富含脯氨酸突触相关蛋白SHANK2、含卷曲结构域的蛋白(CCDC)39和MEX3A。基于这些特征,利用人工神经网络算法构建诊断预测模型。该模型在训练集上的受试者工作特征曲线(ROC)下的面积(AUC)为0.970,内部验证集的AUC为0.907。外部验证结果显示,模型在转录组测序数据集、微阵列数据集以及包含转录组和微阵列的独立数据集上的AUC分别为0.851、0.842和0.791。 结论 本研究鉴定了几个潜在的鼻咽癌重要基因特征,基于重要基因特征构建了诊断预测模型,该模型在不同数据来源的外部验证集中展现出了良好的泛化能力,有望为临床早期筛查和治疗干预以及分子机制的研究提供新的思路和参考。 Objective To construct nasopharyngeal carcinoma diagnostic prediction model based on mRNA gene expression through support vector machine recursive feature elimination algorithm and artificial neural network algorithm screening, and to provide a reference for clinical early screening, intervention, and molecular mechanism research. Methods Gene expression profile data including microarray and RNA-seq data were obtained from public databases [Gene Expression Omnibus (GEO) , International Cancer Genome Consortium (ICGC) , and Genotype Tissue Expression (GTEx) ]comprising 216 nasopharyngeal carcinoma patients and 248 normal controls. Differential expression analysis was first conducted to identify genes associated with nasopharyngeal carcinoma. Key gene features were then selected using support vector machine recursive feature elimination. Finally, an nasopharyngeal carcinoma diagnostic prediction model was constructed using artificial neural network, and its accuracy and predictive performance were assessed through internal and external validation sets. Results A total of 457 differentially expressed genes were identified. Subsequently, six key gene features, namely PLAU, SHISA3, MMP1, SHANK2, CCDC39, and MEX3A, were selected through support vector machine recursive feature elimination. Based on these features, the diagnostic prediction model constructed using artificial neural network. An area under the curve (AUC) of the receiver operating characteristic curve (ROC) of 0.970 on the training set and 0.907 on the internal validation set were achieved. External validation showed that the AUC of the model on the RNA-seq dataset, microarray dataset, and a combined dataset of both RNA-seq and microarray were 0.851, 0.842, 0.791, respectively. Conclusion Several potential nasopharyngeal carcinomas diagnostic important genes were identified in this study, and a gene feature-based nasopharyngeal carcinoma diagnostic prediction model was successfully established, which is expected to provide new ideas and references for clinical early screening and treatment intervention as well as molecular mechanism research.
Abstract
Objective To construct nasopharyngeal carcinoma diagnostic prediction model based on mRNA gene expression through support vector machine recursive feature elimination algorithm and artificial neural network algorithm screening, and to provide a reference for clinical early screening, intervention, and molecular mechanism research. Methods Gene expression profile data including microarray and RNA-seq data were obtained from public databases [Gene Expression Omnibus (GEO) , International Cancer Genome Consortium (ICGC) , and Genotype Tissue Expression (GTEx) ]comprising 216 nasopharyngeal carcinoma patients and 248 normal controls. Differential expression analysis was first conducted to identify genes associated with nasopharyngeal carcinoma. Key gene features were then selected using support vector machine recursive feature elimination. Finally, an nasopharyngeal carcinoma diagnostic prediction model was constructed using artificial neural network, and its accuracy and predictive performance were assessed through internal and external validation sets. Results A total of 457 differentially expressed genes were identified. Subsequently, six key gene features, namely PLAU, SHISA3, MMP1, SHANK2, CCDC39, and MEX3A, were selected through support vector machine recursive feature elimination. Based on these features, the diagnostic prediction model constructed using artificial neural network. An area under the curve (AUC) of the receiver operating characteristic curve (ROC) of 0.970 on the training set and 0.907 on the internal validation set were achieved. External validation showed that the AUC of the model on the RNA-seq dataset, microarray dataset, and a combined dataset of both RNA-seq and microarray were 0.851, 0.842, 0.791, respectively. Conclusion Several potential nasopharyngeal carcinomas diagnostic important genes were identified in this study, and a gene feature-based nasopharyngeal carcinoma diagnostic prediction model was successfully established, which is expected to provide new ideas and references for clinical early screening and treatment intervention as well as molecular mechanism research.