Construction and evaluation of risk prediction model for non-suicidal self-injury of middle school students
Objective To construct a non-suicidal self-injury(NSSI)risk prediction model for middle school students using dif-ferent machine learning algorithms and evaluate the model's effectiveness,so as to provide guidance for the prevention and control of NSSI in campus.Methods In March 2023,a total of 3 372 middle and high school students from schools in Nanchang,Fuzhou and Shangrao cities in Jiangxi Province were selected by combining stratified random cluster sampling and convenient sampling methods.Questionnaire surveys were conducted using various instruments including general information questionnaire,Self-esteem Scale,Ottawa Self-injury Scale,Social Support Assessment Scale,Chinese Version of the Olweus Bullying Questionnaire,Event Attribution Style Scale,Adolescent Resilience Scale,and Adolescent Life Events Scale.Data were divided into training set(n=2 361)and test set(n=1011)at a ratio of 7∶3,and variables were selected based on univariate and LASSO regression results.Four machine learning algorithms including namely random forest,support vector machine,Logistic regression and XGBoost,were used to construct NSSI risk prediction models,and the models'performance was evaluated and compared using metrics including ar-ea under curve(AUC),sensitivity,specificity,positive predictive value,negative predictive value and F1 score.Results The de-tection rate of NSSI among middle school students was 34.4%.Univariate analysis showed that there were statistically significant differences in NSSI detection rates among middle school students of different grades,genders,registered residence locations,wheth-er they were class cadres and four types of bullying(physical,verbal,relational bullying and cyberbullying)(x2=27.17,15.81,11.54,4.63;68.22,140.63,77.81,13.95,P<0.05).NSSI was included as the dependent variable in the LASSO regression model for variable screening,and the results regression identified 10 predictive variables including grade level,self-esteem,subjective support,support utilization,verbal bullying,emotional control,interpersonal relationships,punishment,loss of relatives and prop-erty,and health and adaptation issues.The AUC values of random forest,support vector machine,Logistic regression,and XG-Boost algorithms were 0.76,0.76,0.76 and 0.77,respectively,with no statistically significant differences between pairwise com-parisons(Z=-0.59-0.82,P>0.05).Sensitivity values were 0.62,0.61,0.62 and 0.61,respectively.Specificity values were 0.74,0.78,0.78 and 0.78,respectively.Positive predictive values were 0.56,0.59,0.60 and 0.59,respectively.Negative predictive val-ues were 0.79,0.79,0.80 and 0.79,respectively.F1 scores were 0.59,0.60,0.61 and 0.60,respectively.Conclusions All four non-suicidal self-injury risk prediction models perform well,with the Logistic regression model slightly outperforming the others.Schools and parents should pay attention to the predictive factors corresponding to NSSI,so as to reduce the occurrence of NSSI a-mong middle school students.