首页|基于可解释性机器学习模型的重金属暴露与卒中风险预测的研究

基于可解释性机器学习模型的重金属暴露与卒中风险预测的研究

扫码查看
目的 比较脑卒中和非脑卒中人群重金属含量差异,并构建可解释性机器学习模型以尝试结合重金属含量预测脑卒中风险。方法 收集 2009~2018 年NHANES调查统计的人群数据,使用随机降采样法将数据调整为平衡数据集,按照 7:3 的比例将所有样本随机划分为训练集和测试集,构建并训练支持向量机、随机森林、梯度提升树三个基于不同原理的机器学习模型,并采用绘制受试者工作特征曲线(ROC)并计算曲线下面积(AUC)以及准确度、精确度、敏感度、特异度的多种方式来评价模型性能,最后基于博弈论中shapley值的方法来评价每个特征对于模型的共享程度,提高模型的可解释性并分析重金属含量在模型预测中的作用。结果 单因素分析结果显示:年龄、种族、教育程度、家庭贫困收入比、是否患有糖尿病、高血压、高胆固醇血症在脑卒中组和非脑卒中组相比较,差异均有统计学意义(P<0。001)。而性别因素在本研究人群的两组中则差异无统计学意义(P>0。05);在构建的三个机器学习模型中,随机森林模型取得了最佳表现(AUC为 0。8087),根据shap值可以看出重金属含量与脑卒中的关系按照贡献度从高到底依次是铅、镉、汞、锰,并且从热图看出不同样本会选择不同重金属含量参与模型的预测。结论 脑卒中和非脑卒中组的重金属含量存在显著差异,且可用于构建可解释的机器学习脑卒中预测模型。
Research on predicting heavy metal exposure and stroke risk using interpretable machine learning models
Objective To compare the differences in heavy metal content between stroke and non-stroke populations,and to construct an interpretable machine learning model to predict stroke risk combined with heavy metal content.Methods The data collected for the National Health and Nutrition Examination Survey(NHANES)from 2009 to 2018 were adjusted to a balanced dataset using the random downsampling method.All samples were randomly divided into training and testing sets in a 7:3 ratio.Three machine learning models based on different principles were constructed and trained,including support vector machine,random forest,and gradient boosting tree.The receiver operating characteristic curve(ROC)was plotted to calculate the area under the curve(AUC),and various methods such as accuracy,precision,sensitivity and specificity were used to evaluate the performance of the models.Finally,the shapley value method in game theory was used to evaluate the sharing degree of each feature for the model,so as to improve the interpretability of the model and analyze the role of heavy metal content in the model prediction.Results The univariate analysis revealed that there were significant differences(P<0.001)in various factors,including age,race,education level,family poverty-income ratio,presence of diabetes,hypertension,and hypercholesterolemia in stroke and non-stroke groups.However,gender did not exhibit a significant difference between the two groups in this study(P>0.05).Among the three constructed machine learning models,the random forest model demonstrated superior performance with an AUC of 0.8087.According to the shap value,it could be seen that the relationship between heavy metal content and stroke was ranked in descending order of contribution:lead,cadmium,mercury and manganese.Furthermore,the heat map revealed that different samples would choose different heavy metal contents to participate in the prediction of the model.Conclusion Heavy metal content exhibits significant differences between stroke group and non-stroke group,and it can be used to construct interpretable machine learning stroke prediction models.

StrokeMachine learningHeavy metalsGame theoryShap value

许冬、刘聪慧、苏芳慧、邱思峥、童会霞

展开 >

安阳市人民医院神经电生理科,河南安阳 455000

安阳市人民医院 神经内科,河南安阳 455000

脑卒中 机器学习 重金属 博弈论 Shap值

河南省医学科技攻关计划联合共建项目安阳市重点研发与推广专项立项项目

LHGJ202308572023C01-SF218

2024

中国现代医药杂志
北京航天总医院

中国现代医药杂志

影响因子:0.689
ISSN:1672-9463
年,卷(期):2024.26(3)
  • 22