Research on predicting heavy metal exposure and stroke risk using interpretable machine learning models
Objective To compare the differences in heavy metal content between stroke and non-stroke populations,and to construct an interpretable machine learning model to predict stroke risk combined with heavy metal content.Methods The data collected for the National Health and Nutrition Examination Survey(NHANES)from 2009 to 2018 were adjusted to a balanced dataset using the random downsampling method.All samples were randomly divided into training and testing sets in a 7:3 ratio.Three machine learning models based on different principles were constructed and trained,including support vector machine,random forest,and gradient boosting tree.The receiver operating characteristic curve(ROC)was plotted to calculate the area under the curve(AUC),and various methods such as accuracy,precision,sensitivity and specificity were used to evaluate the performance of the models.Finally,the shapley value method in game theory was used to evaluate the sharing degree of each feature for the model,so as to improve the interpretability of the model and analyze the role of heavy metal content in the model prediction.Results The univariate analysis revealed that there were significant differences(P<0.001)in various factors,including age,race,education level,family poverty-income ratio,presence of diabetes,hypertension,and hypercholesterolemia in stroke and non-stroke groups.However,gender did not exhibit a significant difference between the two groups in this study(P>0.05).Among the three constructed machine learning models,the random forest model demonstrated superior performance with an AUC of 0.8087.According to the shap value,it could be seen that the relationship between heavy metal content and stroke was ranked in descending order of contribution:lead,cadmium,mercury and manganese.Furthermore,the heat map revealed that different samples would choose different heavy metal contents to participate in the prediction of the model.Conclusion Heavy metal content exhibits significant differences between stroke group and non-stroke group,and it can be used to construct interpretable machine learning stroke prediction models.
StrokeMachine learningHeavy metalsGame theoryShap value