摘要
目的:构建基于机器学习算法的动脉瘤性蛛网膜下腔出血(aSAH)预后预测模型.方法:回顾性收集2020年10月至2021年9月在天津市环湖医院治疗的326例aSAH患者临床数据,按7:3随机划分训练集和测试集,训练集用于构建预测模型,测试集用于评价模型效果.采用SMOTE过采样技术处理不平衡数据,使用最小绝对收缩和选择算子(Lasso)分析来选择最佳特征.基于最优特征,应用基于机器学习的逻辑回归(LR)、BP神经网络、K最近邻(KNN)、随机森林(RF)、决策树(DT)、支持向量机(SVM)、朴素贝叶斯(NB)和XGBoost算法构建预测模型.结果:采用Lasso回归对变量进行筛选,获得11个最优特征.LR、BP神经网络、KNN、RF、DT、SVM、NB和XGBoost模型的准确率分别为0.847,0.847,0.816,0.867,0.806,0.827,0.745,0.816;AUC分别为 0.784、0.794、0.646、0.821、0.499、0.765、0.737、0.676.结论:机器学习模型在预测aSAH预后方面有较好的效果,其中RF模型性能最佳.
Abstract
OBJECTIVE To construct a prediction model based on machine learning algorithm for predicting the prognosis of aneurysmal subarachnoid hemorrhage(aS AH).METHODS A total of 326 patients with aSAH treated in Tianjin Huanhu Hospi-tal from October 2020 to September 2021 were reviewed.According to the ratio of 7∶3,all the data were randomly divided into training set(to construct the prediction model)and test set(to evaluate the prediction model).SMOTE was used to deal with the imbalance data.Least absolute shrinkage and selection operator(Lasso)analysis was used to select the optimal variables.Logistic regression(LR),BP neural network,K near neighbor(KNN),random forest(RF),decision tree(DT),support vector machine(SVM),naive Bayes(NB)and XGBoost algorithm based on machine learning were used to construct the predictive model.RESULTS Eleven optimal features were obtained by Lasso regression.The accuracy of LR,BP neural network,KNN,RF,DT,SVM,NB and XGBoost model were 0.847,0.847,0.816,0.867,0.806,0.827,0.745 and 0.816,and AUC were 0.784,0.794,0.646,0.821,0.499,0.765,0.737 and 0.676,respectively.CONCLUSION Machine learning models are relatively more effective in predicting aS AH prognosis,and the RF model exhibits the best performance.