基于聚落的人口统计数据空间分解算法

A settlement-based spatial dis-aggregation algorithm for demographic data

扫码查看

原文链接

维普
万方数据

中文摘要：利用人口密度随机森林模型探讨人口密度与影响因子之间的非线性关系,是当前人口分布研究的前沿,但人口统计数据在空间分解过程中非正规约束下的最优输运问题尚未妥善解决.本文基于面积加权法,以矢量格式的村人口数据集为起点,以矢量格式的聚落和公顷网格数据集为约束,设计了一套顾及聚落分布的人口统计数据空间分解算法.通过将村常住人口数据依次分解至聚落和公顷网格之中,获得了栅格人口密度数据集(SJZ_RK).分析表明,SJZ_RK数据集的人口总数为1 039.60万人,仅产生0.04％的误差,说明本文提出的人口统计数据空间分解算法具有较高准确度.经测算,SJZ_RK数据集的人口分布基尼系数(0.890 9)＞GHS_POP(0.854 8)＞SJZ_CUN_RK(0.589 8)＞GPWv4(0.5897),说明考虑聚落分布状况的SJZ_RK数据集很好地刻画了人口分布的空间集聚和异质性特征,为构建人口密度随机森林模型等监督类机器学习模型训练样本提供了高质量的人口密度标签数据.在刻画非聚落区、城市聚落区、值域范围方面,SJZ_RK数据集更接近实际情况,其在前两方面优于GHS_POP数据集,其在这3个方面均显著优于SJZ_CUN_RK和GPWv4两个数据集.本文算法破解了2个难题:①优化了获取高精度栅格人口密度数据集的计算程序,实现了相对准确的人口分布离散化表达;②统一了人口密度标签数据和影响因子数据的粒度,从而为人口密度随机森林模型训练样本摆脱MAUP的困扰,为克服人口密度随机森林模型的区群谬误问题,创造了必要条件.

外文摘要：Exploring the non-linear relationship between population density and impact factors with random forest model of population density is the frontier of current population distribution research.However,the problem of optimal transport of demographic data under informal constraints in the process of spatial dis-ag-gregation has not been properly addressed.Based on an areal weighting technique,this study took into account the settlement distribution and developed a spatial dis-aggregation algorithm for demographic data.The al-gorithm began with a spatial dataset of the village population in vector format and used the settlements and hectare grid datasets as constraints.The raster dataset of population density(SJZ_RK)was obtained by dis-ag-gregating the village resident population data into settlements and hectare grids.The analysis demonstrated that the total population of the SJZ_RK dataset is 10.396 million,with only 0.04％error,indicating that the spatial dis-aggregation algorithm for demographic data proposed in this paper has high accuracy.The Gini coefficient of population distribution in SJZ_RK(0.890 9)is greater than that in GHS_POP(0.854 8),SJZ_CUN_RK(0.589 8),and GPWv4(0.589 7).This indicates that the SJZ_RK,which considers the distribution of settle-ments,effectively characterizes the spatial agglomeration and heterogeneity characteristics of population distri-bution.It provides high-quality population density label data for the construction of supervised machine learn-ing model training samples such as population density random forest models.In terms of depicting non-settle-ment areas,urban settlement areas,and value domain ranges,the SJZ_RK was more accurate than the GHS_POP in the first two aspects,and significantly outperformed GPWv4 and SJZ_CUN_RK in these three aspects.The algorithm in this article resolved two problems.1)The program for calculating a high precision population density raster dataset was optimized,resulting in a relatively precise discrete representation of popu-lation distribution.2)The raster granularity of the population density labeled data and the influence factor data was unified,so that the training samples of the population density random forest model were free from the MAUP,and the necessary conditions were created to overcome the ecological fallacy.

外文关键词：

population densityareal weightingdis-aggregation algorithmsettlement

作者：

李艳成、温佩璋、刘劲松

展开 >

作者单位：

河北师范大学地理科学学院,河北石家庄 050024

河北省环境变化遥感识别技术创新中心,河北石家庄 050024

河北师范大学地理计算与规划研究中心,河北石家庄 050024

河北省环境演变与生态建设重点实验室,河北石家庄 050024

展开 >

关键词：

人口密度面积加权分解算法聚落

基金：

国家自然科学基金项目国家自然科学基金项目第二次青藏高原综合科学考察研究河北省自然科学基金项目河北师范大学重点发展基金项目

项目编号：

42071167408710732019QZKK0406D2007000272L2024ZD07

出版年：

2024

DOI：

10.13249/j.cnki.sgs.20221123

地理科学

中国科学院东北地理与农业生态研究所

地理科学

CSTPCDCSSCICHSSCD北大核心

影响因子：3.117

ISSN：1000-0690

年,卷(期)：2024.44(7)