首页|大气污染对学生因呼吸系统症状缺课影响的机器学习算法应用研究

大气污染对学生因呼吸系统症状缺课影响的机器学习算法应用研究

扫码查看
目的 探讨机器学习预测模型在学生因大气污染引起呼吸系统症状缺课短期序列中的应用性能,以期为学校疾病发生的早期预警提供方法学参考。方法 基于江苏省2019年9月-2022年10月学生因呼吸系统症状缺课短期序列数据,集成大气污染物平均浓度数据,结合单因素分布滞后非线性模型筛选大气污染物最优滞后变量,构建极端梯度提升(XGBoost)算法模型预测学生因呼吸系统症状缺课频数,并与季节性自回归综合移动平均外生(SARIMAX)模型进行比较。结果 2019-2022年江苏省日均因呼吸系统症状缺课学生9 709名,大气指标日均空气质量指数(AQI)为76。96,PM25、PM10、NO2以及O3的日均质量浓度分别为35。75,61。13,28。89,104。81 µg/m3。格兰杰因果检验显示,AQI、PM2。5、PM10、NO2和O3均是因呼吸系统症状缺课频数序列的预测因素(F值分别为1。46,1。79,1。67,3。41,2。18,P值均<0。01)。PM2。5、PM10、NO2和O3单日滞后效应RR值分别在lag4、lag0、lag0、lag4时达到峰值。结合大气污染物最优滞后变量的XGBoost模型与SARIMAX模型相比,平均绝对误差(MAE)指标由2。251降低至0。475、平均绝对百分比误差(MAPE)指标由0。429降低至0。080、均方根误差(RMSE)指标由2。582降低至0。713。预警阈值为P75时,XGBoost模型与SARIMAX模型相比,灵敏度由0。086提升至0。694、特异度由0。979提升至0。988、约登指数由0。065提升至0。682。结论 XGBoost模型在预测学生因大气污染引起呼吸系统症状缺课短期序列方面有较好的预测性能和预警效果。学校可适时采用该模型,及早发现疾病流行进行预警及防控,完善学校卫生工作。
Applied research of the impact of air pollution on absenteeism in students with respiratory issues through machine learn-ing analysis
Objective To explore the performance of machine learning prediction models in forecasting student absenteeism due to respiratory symptoms caused by air pollution in short term,aiming to provide a methodological reference for early warning systems of school diseases.Methods Utilizing data from short-term sequences of student absenteeism due to respiratory symptoms in Jiang-su Province from September 2019 to October 2022,the study integrated average concentrations of atmospheric pollutants.A univari-ate distributed lag nonlinear model was employed to select optimal lag variables for the pollutants.An extreme gradient boosting(XG-Boost)algorithm model was developed to predict the frequency of absenteeism due to respiratory symptoms and compared with the seasonal autoregressive integrated moving average with exogenous factors(SARIMAX)model.Results Between 2019 and 2022,an average of 9 709 students per day in Jiangsu Province were absent due to respiratory symptoms.The daily average air quality index(AQI)was 76.96,with mass concentrations of PM2 5,PM10,NO2,and O3 averaging at 35.75,61.13,28.89,104.81 μg/m3,re-spectively.Granger causality tests indicated that AQI,PM25,PM10,NO2,and O3 were significant predictors of absenteeism fre-quency due to respirutory symptoms(F=1.46,1.79,1.67,3.41,2.18,P<0.01).The single-day lag effects of PM2 5,PM10,NO2,and O3 reached their peak relative risk(RR)values at lag4,lag0,lag0,lag4 respectively.When integrating these optimal lag varia-bles for the pollutants,the XGBoost model demonstrated superior predictive performance to the SARIMAX model,reducing the mean absolute error(MAE)from 2.251 to 0.475,mean absolute percentage error(MAPE)from 0.429 to 0.080,and root mean square error(RMSE)from 2.582 to 0.713;at the P75 percentile alert threshold,the sensitivity improved from 0.086 to 0.694 and specificity from 0.979 to 0.988,with the Youden index increasing from 0.065 to 0.682.Conclusions The XGBoost model exhibits robust predictive performance and effective early warning capabilities for short-term sequences of student absenteeism due to respira-tory symptoms caused by air pollution.Schools could timely adopt this model to preemptively detect and control disease outbreaks,thereby enhancing school health management.

Air pollutionRespiratory systemAbsenteeismModels,statisticalStudents

曹承斌、杨文漪、余小金、王艳、杨婕

展开 >

东南大学公共卫生学院,江苏南京 210009

江苏省疾病预防控制中心儿童青少年健康促进所

空气污染 呼吸系统 缺勤 模型,统计学 学生

江苏省研究生科研与实践创新项目

SJCX22_0076

2024

中国学校卫生
中华预防医学会

中国学校卫生

CSTPCD北大核心
影响因子:1.423
ISSN:1000-9817
年,卷(期):2024.45(6)