环境卫生学杂志2024,Vol.14Issue(3) :264-269,272.DOI:10.13421/j.cnki.hjwsxzz.2024.03.013

三种机器学习模型用于空气质量等级预测的比较研究—以保定市为例

Comparison of three machine learning models for air quality level prediction:a case study of Baoding

刘婕 郝舒欣 万红燕 刘悦 徐东群
环境卫生学杂志2024,Vol.14Issue(3) :264-269,272.DOI:10.13421/j.cnki.hjwsxzz.2024.03.013

三种机器学习模型用于空气质量等级预测的比较研究—以保定市为例

Comparison of three machine learning models for air quality level prediction:a case study of Baoding

刘婕 1郝舒欣 1万红燕 2刘悦 1徐东群1
扫码查看

作者信息

  • 1. 中国疾病预防控制中心环境与人群健康重点实验室/中国疾病预防控制中心环境与健康相关产品安全所,北京 100021
  • 2. 东南大学附属中大医院
  • 折叠

摘要

目的 利用支持向量机(support vector machine,SVM)、随机森林(random forest,RF)和多层感知器(multilayer per-ceptron,MLP)三种机器学习方法分别构建保定市未来三日空气质量等级预测模型,通过对参数调优和预测结果比较选择三种模型中的最佳模型.方法 基于保定市 2014-2022 年的空气污染物日均浓度监测数据和同期气象数据,采用 SVM、RF 和MLP 三种机器学习模型,利用前四日数据为未来三日分别构建了每日的空气质量等级预测模型并评估特征变量的重要性.对模型参数进行调优,采取十折交叉验证法进行验证,通过准确率和 AUC 等指标来评估模型性能.结果 SVM 模型未来三日准确率分别为 69.8%、63.5%、62.3%,AUC分别为 77.4、70.8、70.7;RF模型未来三日准确率分别为 75.9%、68.2%、67.1%,AUC分别为 0.84、0.74、0.72;MLP 模型未来三日准确率分别为 73.2%、66.4%、65.7%,AUC 为 0.83、0.74、0.73,综合对比 RF模型表现最优;空气质量特征变量重要性高于气象因素特征变量.结论 通过对比研究,RF机器学习模型能够相对有效地预测未来一日空气污染等级,并提供空气质量等级预警.

Abstract

Objective To construct air quality level prediction models for the next three days in Baoding,China using the support vector machine(SVM),random forest(RF),and multilayer perceptron(MLP)independently,and to select the optimal model from the three models by tuning parameters and comparing the prediction result.Methods Based on the daily average concentration monito-ring data of air pollutants and concurrent meteorological data in Baoding from 2014 to 2022,SVM,RF,and MLP models were con-structed to forecast the air quality level for each of the next three days using the data of the previous four days,and the importance of feature variables was assessed.The model parameters were fine-tuned,and 10-fold cross-validation was performed.The performance of the models was evaluated using indicators including the accuracy rate and the area under the curve(AUC).Results For the SVM model,the accuracy rates for the next three days were 69.8%,63.5%,and 62.3%respectively,and the AUC values were 77.4,70.8,and 70.7,respectively.For the RF model,the accuracy rates for the next three days were 75.9%,68.2%,and 67.1%,re-spectively,with AUC being 0.84,0.74,and 0.72,respectively.For the MLP model,the accuracy rates for the next three days were 73.2%,66.4%,and 65.7%,respectively,and the AUC values were 0.83,0.74,and 0.73,respectively.The results indicated that the RF model showed the best performance.The importance of air quality feature variables was higher than that of meteorological fea-ture variables.Conclusion Through comparison,the RF machine learning model can effectively predict the air pollution level for the next day and provide early warnings of air quality levels.

关键词

机器学习/空气污染/支持向量机/随机森林/多层感知器

Key words

machine learning/air pollution/support vector machine(SVM)/random forest(RF)/multilayer perceptron(MLP)

引用本文复制引用

基金项目

国家自然科学基金面上项目(21677136)

出版年

2024
环境卫生学杂志
中国疾病预防控制中心

环境卫生学杂志

CSTPCD
影响因子:0.735
ISSN:2095-1906
参考文献量19
段落导航相关论文