完整的PM2.5时空数据集是实现大气污染防治的关键.然而,实时获取的PM2.5数据集容易受机器故障、人为失误、大气等因素影响普遍存在缺失.针对现有缺失值重构方法未能充分顾及PM2.5日周期性及其与影响因子之间的复杂关系等问题,本文提出了一种顾及日周期性的PM2.5站点缺失值重构方法(Daily Periodicity-Based Spatial-Temporal Interpolation,DP-STF).DP-STF首先以日观测数据为处理单元基于时空相关性对缺失位置筛选最优时空邻域,然后利用P-BSHADE(Point Estima-tion Model of Biased Sentinel Hospital-based Area Disease Estimation)顾及时空异质性以迭代方式对缺失数据进行时空初始估计,最后利用Stacking集成机器学习拟合PM2.5与其影响因子的复杂时空非线性关系,并用于缺失PM2.5数据估计.以京津冀2020年小时尺度PM2.5站点数据为研究对象,利用DP-STF方法对缺失数据重构并与7种经典方法对比.实验结果表明:相比传统方法,DP-STF精度最优,其平均RMSE、MAE至少降低了39.83%、40.12%,R2至少提高了5.56%.此外,DP-STF还能够有效捕捉PM2.5极值,极大提升了在时空非平稳区的预测精度.
Reconstruction of Missing Values at PM2.5 Monitoring Sites Considering Daily Periodicity
As one of the main components of air pollutants,PM2.5 seriously affects human health,causing issues such as respiratory system damage and increased incidence of cancer and cardiovascular diseases.A complete spatio-temporal dataset of PM2.5 is key to realizing air pollution control.However,the current PMPM2.5 datasets often have missing values due to machine failures,human errors,atmospheric conditions,and other factors.Addressing the problems that existing methods for reconstructing missing values fail to fully consider daily periodicity,spatial-temporal heterogeneity of PM2.5,and the complex nonlinear relationships with the influencing factors,this paper proposes a Daily Periodicity-based Spatial-Temporal Interpolation method(DP-STF)to reconstruct the missing values of PM2.5 measurements.The method first uses the daily observation data as the processing unit to screen for the optimal spatial stations and time series based on spatio-temporal correlation for the missing locations in both temporal and spatial dimensions.It then utilizes the P-BSHADE(Point Estimation Model of Biased Sentinel Hospital-based Area Disease Estimation)method to reconstruct the missing values of PM2.5 stations,considering temporal and spatial periodicity.The approach iteratively takes into account spatio-temporal heterogeneity in the initial estimation of the missing data.Finally,Stacking integrated machine learning is used to fit the complex spatio-temporal nonlinear relationships of PM2.5.The initial spatio-temporal estimates and the PM2.5 impact factor drive the training of the Stacking integrated model,which is then used for estimating the missing PM2.5 data.Using the hourly-scale PM2.5 station data of Beijing-Tianjin-Hebei from 2020 as the research object,the missing data are reconstructed and compared with seven classical methods using the DP-STF method.The experimental results show that,compared to the classical methods,DP-STF achieves superior accuracy.The average RMSE and MAE of this method are reduced by at least 39.83%and 40.12%,respectively,and the R2 is improved by at least 5.56%.Additionally,this method effectively captures the extreme values of PM2.5,significantly increasing the prediction accuracy of the model in spatio-temporal non-stationary regions.
PM2.5missing value reconstructiondaily periodicityintegrated machine learningair pollutionspatio-temporal interpolationspatio-temporal heterogeneity