目前,以遥感测量手段获取的全球数字高程模型(Global Digital Elevation Model,GDEM)在城区含有各种地物而不能反映真实的地球表面形态.因此,为提高城区GDEM产品质量,本文提出了一种顾及空间异质性和SHAP(Shapley Additive Ex-planations)特征筛选的GDEM修正方法.该方法首先利用高斯混合聚类(Gaussian Mixture Model,GMM)和泰森多边形将研究区域划分为同质子区,解决GDEM修正误差和特征因子空间关系的异质性问题;然后基于SHAP可解释框架筛选每个子区的最优特征变量并借助随机森林(RF)构建对应的修正模型用于城区GDEM修正.为验证本文方法的实用性和高效性,以纽约30m分辨率的COPDEM(COPDEM30)为研究对象,将本文方法与其他3种方法(包括顾及空间异质性的局域随机森林(SH-RF)、顾及SHAP特征筛选的全局随机森林(FS-RF)和传统随机森林(RF))的预估结果以及一种去除建筑物和植被偏差的GDEM(FABDEM)进行对比.实验结果表明,本文方法预测效果最优,SH-RF、FS-RF次之,RF效果最差.此外,与FAB-DEM相比,本文方法的平均绝对误差(MAE)降低了42.3%,中误差(RMSE)降低了63.2%.对本文方法的迁移实验表明,相比原始COPDEM30,修正后GDEM的MAE和RMSE分别降低了50.5%和50.4%.
Improving Urban Digital Elevation Models Based on Iinterpretable Random Forest Method Considering Spatial Heterogeneity
Due to the influence of buildings,the Global Digital Elevation Model(GDEM)obtained through remote sensing measurement still contains various ground features in urban areas and cannot reflect the bare Earth's surface.This limits its application as basic data in hydrological simulation,geological disaster prediction,urban construction,and other fields.Therefore,in order to improve the quality of GDEM products in urban areas,this paper proposed a GDEM correction method that takes into account spatial heterogeneity and SHAP(Shapley Additive Explanations)feature screening.Firstly,the Gaussian Mixture Model(GMM)and Theissen polygons were used to divide the study region into the several sub-areas to solve the heterogeneity of GDEM correction error and the spatial relationship of characteristic factors.Then,the Random Forest(RF)was used as the base model,coupled with the SHAP interpretable framework,to screen the optimal feature variables in each subregion.Finally,based on the selected features,the corresponding modified model was reconstructed for urban GDEM correction.To verify the practicability and efficiency of the proposed method,23 initial feature variables were selected in this paper.COPDEM(COPDEM30)with 30 m resolution in New York was taken as the research object,and airborne Light Detection And Ranging(LiDAR)DTM data were used as the reference.The proposed method was compared with three other methods,including spatially heterogeneous random forest(SH-RF),global random forest(FS-RF),and traditional Random Forest(RF),as well as an existing GDEM(FABDEM)product which removes building and vegetation biases.The experimental results show that the proposed method had the best prediction performance,with its Mean Absolute Error(MAE)decreasing from 5.209 m to 1.436 m and median error(RMSE)decreasing from 8.884 m to 2.258 m,followed by SH-RF,with its MAE decreasing by 3.607 m and RMSE decreasing by 6.389 m.RF was the worst,with MAE decreasing by 3.179 m and RMSE decreasing by 5.838 m.In addition,compared to FABDEM,the MAE and RMSE of the proposed method were reduced by 42.3%and 63.2%,respectively.The migration experiments on the proposed method showed that,compared to the original COPDEM30,the modified GDEM's MAE and RMSE were reduced by 50.5%and 50.4%,respectively.The visual comparison of the DEM before and after modification also showed that the modified COPDEM30 not only retained the topographic features well but also had similar elevation distribution with LiDAR DTM.Therefore,the method in this paper shows a novel generalization ability.
global digital elevation modelcorrectionspatial heterogeneitySHAPfeature selectionurban areaaccuracy assessment