空气质量是生态环境保护的一个重要指标.在空气评估时需对PM2.5、PM10、SO2、O3等因素综合考虑,因此本文提出一种随机森林优化模型SPRF(Secondary Proximity Random Forest)对空气质量进行评估.针对数据不平衡问题,对空气质量样本进行欠采样,并使用Gini指数构建决策树.在构建基分类器时,增加 KNN(K Nearest Neighbors)和QDA(Quadratic Discriminant Analysis)作为基分类器参与随机森林集成,采用Bagging的思想将新的分类结果加入投票中提高空气质量评估模型的准确度和稳定性;由于不同决策树在投票中的权重都是相同的,结合卡方检验对决策树的权重进行优化,并选用中国2022年各城市质量数据进行实验.实验结果表明,与决策树、多层感知器等模型相比,SPRF评估模型有较高的评估准确率、精确率、查全率、F1分数.
Air Quality Assessment based on Ensemble Learning
Air quality is an important indicator of ecological environmental protection.When e-valuating air quality,factors such as PM2.5,PM10,SO2,and O3 need to be considered compre-hensively.Therefore,this paper proposes a random forest optimization model called SPRF(Sec-ondary Proximity Random Forest)for air quality assessment.To address the issue of imbalanced data,undersampling is applied to air quality samples,and decision trees are constructed using the Gini index.K Nearest Neighbors(KNN)and Quadratic Discriminant Analysis(QDA)are added as base classifiers in the construction of the random forest ensemble.The Bagging approach is employed to incorporate new classification results into voting,thereby improving the accuracy and stability of the air quality assessment model.Since the weights of different decision trees in voting are equal,the weights of decision trees are optimized using the chi-square test.The exper-iment uses air quality data from various cities in China in 2022.The experimental results show that compared to models such as decision trees and multilayer perceptrons,the SPRF evaluation model has higher evaluation accuracy,precision,recall,and Fl score.
Air quality assessmentBaggingRandom forestFeature selection