Applied research of the impact of air pollution on absenteeism in students with respiratory issues through machine learn-ing analysis
Objective To explore the performance of machine learning prediction models in forecasting student absenteeism due to respiratory symptoms caused by air pollution in short term,aiming to provide a methodological reference for early warning systems of school diseases.Methods Utilizing data from short-term sequences of student absenteeism due to respiratory symptoms in Jiang-su Province from September 2019 to October 2022,the study integrated average concentrations of atmospheric pollutants.A univari-ate distributed lag nonlinear model was employed to select optimal lag variables for the pollutants.An extreme gradient boosting(XG-Boost)algorithm model was developed to predict the frequency of absenteeism due to respiratory symptoms and compared with the seasonal autoregressive integrated moving average with exogenous factors(SARIMAX)model.Results Between 2019 and 2022,an average of 9 709 students per day in Jiangsu Province were absent due to respiratory symptoms.The daily average air quality index(AQI)was 76.96,with mass concentrations of PM2 5,PM10,NO2,and O3 averaging at 35.75,61.13,28.89,104.81 μg/m3,re-spectively.Granger causality tests indicated that AQI,PM25,PM10,NO2,and O3 were significant predictors of absenteeism fre-quency due to respirutory symptoms(F=1.46,1.79,1.67,3.41,2.18,P<0.01).The single-day lag effects of PM2 5,PM10,NO2,and O3 reached their peak relative risk(RR)values at lag4,lag0,lag0,lag4 respectively.When integrating these optimal lag varia-bles for the pollutants,the XGBoost model demonstrated superior predictive performance to the SARIMAX model,reducing the mean absolute error(MAE)from 2.251 to 0.475,mean absolute percentage error(MAPE)from 0.429 to 0.080,and root mean square error(RMSE)from 2.582 to 0.713;at the P75 percentile alert threshold,the sensitivity improved from 0.086 to 0.694 and specificity from 0.979 to 0.988,with the Youden index increasing from 0.065 to 0.682.Conclusions The XGBoost model exhibits robust predictive performance and effective early warning capabilities for short-term sequences of student absenteeism due to respira-tory symptoms caused by air pollution.Schools could timely adopt this model to preemptively detect and control disease outbreaks,thereby enhancing school health management.
Air pollutionRespiratory systemAbsenteeismModels,statisticalStudents