摘要
气温作为研究气候演变最基础的物理量,其日值序列的完整性和准确性对于气候分析与评估工作有着重要意义.近些年随着大量无人值守地面加密自动气象站的布设,不断出现随机站点和随机长度这种双随机特点的气象资料序列缺失,给气候分析和业务应用造成了不小的障碍.针对现有气象数据插补方案的不足,提出了一种全新的基于动态时间规整(dynamic time warping,DTW)的气温日值数据二次插补方法.该方法采用了一种实时的插补策略,主要技术内容包括:1)利用一元线性回归方程将原始气温观测时间序列分解出拟合直线和残差曲线,并将二者重构组成新的气温序列;2)给出了气温插补区的定义和插补条件;3)提出了利用动态时间规整方法计算站点间距离的新模式.利用山东省2021年的气温实况数据对该方法进行了双随机检验,检验结果表明:该方法可以满足日平均气温、日最高气温和日最低气温数据的插补需求;在插补流程中采用DTW距离测度和二次插补的组合方法,其插补效果优于目前常见的基于站点地理临近关系的组合方法;该方法对地形有一定的敏感性,平原或丘陵地区的插补效果要优于山地地区.
Abstract
As the most fundamental physical quantity for research on climate evolution,the integrity and accuracy of daily temperature series are of great significance for climate analysis and assessment.In recent years,with the deployment of a large number of unmanned ground intensified automatic weather stations,the probability of observation interruptions or data quality anomalies caused by factors such as instrument failures,communication interruptions and natural disasters has greatly increased.Missing data,characterized by random dis-tribution of stations and random lengths of series,may cause significant obstacles to climate analysis and opera-tional applications.At present,there are two main technical solutions for interpolation of missing daily temperature data.One is the climate correlation statistics solution based on historical data,which selects the optimal reference sequence to achieve data substitution.This solution has high interpolation accuracy,yet low timeliness.The second solution is the spatial interpolation scheme,which can achieve real-time interpolation,but the interpolation effect heavily depends on the density level of station distribution,and the interpolation stability is poor.To address the shortcomings of existing meteorological data interpolation solutions,this paper proposes a new real-time interpola-tion method for missing daily temperature data based on Dynamic Time Warping(DTW).The method adopts a twice interpolation strategy,and the main technical content includes the following:(1)The core technical route of the method is to decompose the temperature series into fitting straight lines and residual curves by using a univari-ate linear regression equation,then the two are reconstructed to achieve the reorganization of the temperature se-ries.(2)The method provides the concept and interpolation conditions for defining the temperature interpolation area,as follows:The continuous missing data area is referred to as the interpolation area(Zone B).If the length of the left neighbor sequence(Zone A)and right neighbor sequence(Zone C)is consistent with the interpola-tion area(Zone B),and the data in Zones A and C are also complete,then it will be confirmed that the missing data in the interpolation area(Zone B)can be interpolated.(3)The method proposes using dynamic time war-ping to calculate the DTW distance between two time series,to determine the optimal reference station for inter-polation in real time.(4)The method demonstrates the two interpolation processes,where the primary interpola-tion directly replaces the interpolation area(Zone B)with reference station data,and the twice interpolation re-constructs the fitted straight line and residual curve derived from the separation of the primary interpolation through cross combination.This paper uses the collected temperature data of Shandong Province for 2021 to con-duct a double random test on the method.The inspection method adopts the method of leaving the observed true value blank to simulate the missing data in the interpolation area(Zone B),and calculates the error between the interpolation value and observed true value after the interpolation.The inspection process implements the condi-tional combination coverage method,recording all interpolation results that can be generated by the combination of interpolation stages(primary and twice interpolation)and distance measurements(such as DTW distance and geographic distance,including horizontal and altitude distance).The test results show that the interpolation method proposed in this paper can meet the interpolation needs of daily mean temperature,daily maximum temperature,and daily minimum temperature data.The combination of DTW distance measurement and twice interpolation can achieve a better effect than the commonly used combination methods based on site geographical proximity rela-tionships.The method has a certain sensitivity to terrain,and its interpolation effect is better in plain or hilly areas than in mountainous ones.The twice interpolation mechanism proposed in this paper has broad application pros-pects for solving the problem of missing meteorological data with double random characteristics,and can also pro-vide a good reference for the homogenization correction of historical long series meteorological data.
基金项目
山东省气象局重点科研项目(2021SDQXZ02)
山东省气象局青年科研基金(2021SDQN03)