Influencing Factors of Elderly Death Based on Machine Learning and Logistic Regression Analysis
Objective To investigate the risk factors of mortality risk in Chinese elderly over 65 years old,and to construct an individualized risk prediction model.Methods Using the survey data of the China health and retirement longitudinal survey(CHARLS)from 2015 to 2020,the elderly were selected as the research objects to collect relevant information about demography,lifestyle,disease history,and physical examination.The data set was randomly divided into a training set and a verification set(7∶3).Three classification algorithms,logistic regression,random forest and extreme gradient boosting(XGBoost)were used to establish a prediction model for elderly death,and the best model was selected for interpretability analysis.Results This study enrolled 5 505 elderly people,including 977 deaths.The prediction results of the three models showed that the extreme gradient elevation model had a good prediction effect(area under the curve was 0.756,95%CI:0.720-0.792),and the results of the explanatory model Shapley additive explanations(SHAP)showed that body mass index,age,gender,education level,and marital status were the top five factors influencing the death of the elderly.Conclusion The XGBoost model has a good predictive effect in predicting the death risk of the elderly,and the SHAP model provides a clear explanation for the individualized death risk prediction.