首页|三种机器学习模型用于空气质量等级预测的比较研究—以保定市为例

三种机器学习模型用于空气质量等级预测的比较研究—以保定市为例

扫码查看
目的 利用支持向量机(support vector machine,SVM)、随机森林(random forest,RF)和多层感知器(multilayer per-ceptron,MLP)三种机器学习方法分别构建保定市未来三日空气质量等级预测模型,通过对参数调优和预测结果比较选择三种模型中的最佳模型.方法 基于保定市 2014-2022 年的空气污染物日均浓度监测数据和同期气象数据,采用 SVM、RF 和MLP 三种机器学习模型,利用前四日数据为未来三日分别构建了每日的空气质量等级预测模型并评估特征变量的重要性.对模型参数进行调优,采取十折交叉验证法进行验证,通过准确率和 AUC 等指标来评估模型性能.结果 SVM 模型未来三日准确率分别为 69.8%、63.5%、62.3%,AUC分别为 77.4、70.8、70.7;RF模型未来三日准确率分别为 75.9%、68.2%、67.1%,AUC分别为 0.84、0.74、0.72;MLP 模型未来三日准确率分别为 73.2%、66.4%、65.7%,AUC 为 0.83、0.74、0.73,综合对比 RF模型表现最优;空气质量特征变量重要性高于气象因素特征变量.结论 通过对比研究,RF机器学习模型能够相对有效地预测未来一日空气污染等级,并提供空气质量等级预警.
Comparison of three machine learning models for air quality level prediction:a case study of Baoding
Objective To construct air quality level prediction models for the next three days in Baoding,China using the support vector machine(SVM),random forest(RF),and multilayer perceptron(MLP)independently,and to select the optimal model from the three models by tuning parameters and comparing the prediction result.Methods Based on the daily average concentration monito-ring data of air pollutants and concurrent meteorological data in Baoding from 2014 to 2022,SVM,RF,and MLP models were con-structed to forecast the air quality level for each of the next three days using the data of the previous four days,and the importance of feature variables was assessed.The model parameters were fine-tuned,and 10-fold cross-validation was performed.The performance of the models was evaluated using indicators including the accuracy rate and the area under the curve(AUC).Results For the SVM model,the accuracy rates for the next three days were 69.8%,63.5%,and 62.3%respectively,and the AUC values were 77.4,70.8,and 70.7,respectively.For the RF model,the accuracy rates for the next three days were 75.9%,68.2%,and 67.1%,re-spectively,with AUC being 0.84,0.74,and 0.72,respectively.For the MLP model,the accuracy rates for the next three days were 73.2%,66.4%,and 65.7%,respectively,and the AUC values were 0.83,0.74,and 0.73,respectively.The results indicated that the RF model showed the best performance.The importance of air quality feature variables was higher than that of meteorological fea-ture variables.Conclusion Through comparison,the RF machine learning model can effectively predict the air pollution level for the next day and provide early warnings of air quality levels.

machine learningair pollutionsupport vector machine(SVM)random forest(RF)multilayer perceptron(MLP)

刘婕、郝舒欣、万红燕、刘悦、徐东群

展开 >

中国疾病预防控制中心环境与人群健康重点实验室/中国疾病预防控制中心环境与健康相关产品安全所,北京 100021

东南大学附属中大医院

机器学习 空气污染 支持向量机 随机森林 多层感知器

国家自然科学基金面上项目

21677136

2024

环境卫生学杂志
中国疾病预防控制中心

环境卫生学杂志

CSTPCD
影响因子:0.735
ISSN:2095-1906
年,卷(期):2024.14(3)
  • 19