A study on the risk prediction model for cryptogenic stroke in patients with right-to-left shunt
Objective To predict the risk of cryptogenic stroke(CS)patients with right-to-left shunt(RLS)by machine learning,and provide potential solutions for accurate and efficient prediction of CS.Methods A retrospective analysis of clinical data on 289 subjects with positive RLS detected by contrast-enhanced transcranial Doppler tests(c-TCD)treated in the Department of Neurology at Laoshan Campus,the Affiliated Hospital of Qingdao University,from January 2018 to September 2023,including demographic infor-mation,medical history,laboratory test indicators,diagnosis,and treatment.The dataset was randomly divided into a training set and a testing set by the machine learning function train_test_split(),with a ratio of 8∶2.Risk prediction models for CS in RLS subjects were constructed by algorithms such as Logistic regression,de-cision trees,random forests,extreme gradient boosting,artificial neural networks,gradient boosting,extra trees,and adaptive Boosting.The model performance was evaluated by receiver operating characteristic curves(ROC),area under curve(AUC),confusion matrix,precision,recall,accuracy,F1 score,calibration curves,and decision curve analysis.The optimal model was subjected to interpretability analysis by feature impor-tance and SHAP values.The t-test,Mann-Whitney U test and x2 test were used for data analysis by SPSS 25.0 software.Delong test was used to compare the differences in AUC between the two models.Results In 289 RLS subjects,there were 166 cases of CS(57.5%)and 123 cases of non-CS(42.5%).The statistical analysis results showed that blood biochemical indicators such as D-dimer,mean platelet volume,and fibrino-gen in CS patients were higher than those in non-CS patients(all P<0.01).There were no statistically signif-icant differences in variables between the training and testing sets(all P>0.05).Random forest model a-chieved the highest AUC(0.885),precision(0.806),recall(0.879),accuracy(0.810),and F1 score(0.841)for CS risk prediction in the testing set.The calibration curve showed that the random forest model was closest to the reference line,and the decision curve analysis indicated that it had a greater net benefit.The interpretability analysis revealed that high-risk factors included mean platelet volume,D-dimer,interna-tional normalized ratio,body mass index,and age.Conclusion The random forest-based prediction tool ex-hibits excellent performance,demonstrating high accuracy in predicting CS risk in RLS population.
Cryptogenic strokeRight-to-left shuntMachine learningPredictive modelRandom forest model