A settlement-based spatial dis-aggregation algorithm for demographic data
Exploring the non-linear relationship between population density and impact factors with random forest model of population density is the frontier of current population distribution research.However,the problem of optimal transport of demographic data under informal constraints in the process of spatial dis-ag-gregation has not been properly addressed.Based on an areal weighting technique,this study took into account the settlement distribution and developed a spatial dis-aggregation algorithm for demographic data.The al-gorithm began with a spatial dataset of the village population in vector format and used the settlements and hectare grid datasets as constraints.The raster dataset of population density(SJZ_RK)was obtained by dis-ag-gregating the village resident population data into settlements and hectare grids.The analysis demonstrated that the total population of the SJZ_RK dataset is 10.396 million,with only 0.04%error,indicating that the spatial dis-aggregation algorithm for demographic data proposed in this paper has high accuracy.The Gini coefficient of population distribution in SJZ_RK(0.890 9)is greater than that in GHS_POP(0.854 8),SJZ_CUN_RK(0.589 8),and GPWv4(0.589 7).This indicates that the SJZ_RK,which considers the distribution of settle-ments,effectively characterizes the spatial agglomeration and heterogeneity characteristics of population distri-bution.It provides high-quality population density label data for the construction of supervised machine learn-ing model training samples such as population density random forest models.In terms of depicting non-settle-ment areas,urban settlement areas,and value domain ranges,the SJZ_RK was more accurate than the GHS_POP in the first two aspects,and significantly outperformed GPWv4 and SJZ_CUN_RK in these three aspects.The algorithm in this article resolved two problems.1)The program for calculating a high precision population density raster dataset was optimized,resulting in a relatively precise discrete representation of popu-lation distribution.2)The raster granularity of the population density labeled data and the influence factor data was unified,so that the training samples of the population density random forest model were free from the MAUP,and the necessary conditions were created to overcome the ecological fallacy.
population densityareal weightingdis-aggregation algorithmsettlement