Relationship between occurrence of type 2 diabetes mellitus and exposure to polybrominated diphenyl ethers and construction and evaluation of the prediction model
Objective To analyse the relationship between the occurrence of type 2 diabetes mellitus(T2DM)and exposure to polybrominated diphenyl ethers(PBDEs),and to construct and evaluate the predictive model for the occur-rence of T2DM by machine learning methods.Methods Totally 1 425 study subjects were screened in the NHANES da-tabase,including 1 132 non-T2DM patients and 293 T2DM patients.The clinical data of non-T2DM patients and T2DM patients were compared,and those with statistically significant differences were taken for further screening of boruta fea-tures to clarify the relationship between T2DM occurrence and PBDEs and their influencing factors.The screened influenc-ing factors for the occurrence of T2DM were inputinto R software,and the data were randomly partitioned according to 80%training set and 20%validation set using the R software createDataPartition function.Seven algorithms,including logistic regression(Logistcs),extreme gradient boosting(XGBoost),light gradient boosting(LightGBM),adaptive boosting(Ad-aBoost),K-nearest neighbours(KNN),plain Bayesian(CNB),and support vector machine(SVM)were used to con-struct the machine learning model,and the training set was input into the model for training,and the validation set was in-put into the model.The model was internally validated using ten-fold cross-validation pairs.The models were evaluated by combining the ROC curve and AUC,and the model with the best prediction performance was selected for external valida-tion.The best predictive model was externally validated by selecting 71 cases of adult T2DM patients and 100 cases of health check-ups from the Department of Endocrinology of the First Affiliated Hospital of Xinjiang Medical University.The SHAP tool was used to analyse the interpretability of the high-performance prediction models and to judge the importance of each feature of the models in the decision-making process.Results BMI,waist circumference,education level,the pro-portion with family history of diabetes,serum HDL and serum BDE-28,BDE-47,BDE-99,BDE-183,BDE-209 concen-trations were higher in T2DM patients than in non-T2DM patients(all P<0.05).Boruta characteristics screening deter-mined waist circumference,BMI,family history of diabetes and serum BDE-47,BDE-99,BDE-28,BDE-209,and BDE-183 as influencing factors for the occurrence of T2DM,which were incorporated into the machine learning algorithm to con-struct the predictive model of T2DM occurrence.The XGBoost model had the highest AUC value in both the training set and the internal validation of the validation set,and was in the top rank in terms of accuracy,Kappa value,sensitivity,and specificity,so it was chosen as a high-efficiency prediction model.The results of external validation showed that the XGBoost model had an accuracy of 0.702,a sensitivity of 0.549,a specificity of 0.787,and an AUC(95%CI)of 0.674(0.575-0.773).Interpretive analyses of the predictions of the XGBoost model by the SHAP tool showed that waist circum-ference and serum BDE-47 were the most important predictive features,while serum BDE-99,BDE-209 and BMI,family history of diabetes had high importance in the model,while serum BDE-28,BDE-183 had relatively low importance in the model.Conclusions Serum BDE-47,BDE-99,BDE-28,BDE-209,and BDE-183 are influential factors for the occur-rence of T2DM,and the XGBoost model based on serum PBDEs,waist circumference,BMI,family history of diabetes mellitus has a high predictive efficacy for the occurrence of T2DM,which is of value in the prediction of the occurrence of T2DM.
type 2 diabetes mellituspolybrominated diphenyl etherspolybrominated diphenyl ether congenersmachine learningprediction model