一种面向精细化地理分区的空间约束聚类方法

A spatially constrained clustering method for fine-scale geographical partitioning

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：在空间分区的相关研究中,虽然已有经典聚类算法k均值聚类(k-means)结合空间约束的成果,但其对于连续平铺面状地理要素的空间聚类适用性不高.因此,本文开展对k-means算法进行空间约束的探讨.通过改进SKATER算法的空间约束方式,构建一种包含自然扩张与次优扩张过程的空间约束的k-means算法;并在两个公共数据集上与已有研究方法进行比较评价.结果表明:本文方法尤其适用于处理连续平铺面状地理要素的分区;通过轮廓系数、DB指数及总残差平方和三个评价指标知,本文方法优于已有的SKATER、AZP及SC k-means方法.研究成果不仅能够为地理信息系统中的空间数据处理提供新的工具,也为聚类算法的研究提供了新的视角.

外文摘要：As a classic clustering algorithm, the k-means algorithm is widely popular due to its simplicity and efficiency in the iterative classification process. However, when applied to specific spatial partitioning tasks, the traditional algorithm shows certain limitations because it either fails to consider spatial constraints or imposes them excessively. This research aims to address these limitations of traditional k-means clustering algorithm and the SKATER algorithm in spatial partitioning tasks. The study introduces an innovative approach by enhancing the k-means algorithm with spatial constraints, refining the methodology used in the SKATER algorithm to better accommodate the specific needs of spatial data analysis. This new method seeks to provide a more robust framework for clustering that respects both the attribute similarity and spatial contiguity of tessellated planar geographic features. The enhanced clustering algorithm, termed spatially constrained k-means, integrates spatial constraints directly into the clustering process to ensure that members of the same cluster are contiguous in space. This integration is achieved by modifying the clustering operation to prioritize spatial connections during each iteration. The method does not rely on traditional objective functions or heuristic approaches; instead, it expands clusters based on direct spatial adjacency, ensuring that the clustering process naturally adheres to the geographic continuity of the data. The effectiveness of this approach was tested using two distinct datasets: the 2020 urban population data from China and historical socio-economic data from 1930s France. These datasets were chosen to illustrate the algorithm's capability across different types of spatial data and scales. The performance of the proposed method was benchmarked against traditional spatially constrained k-means, the SKATER algorithm, and other spatial methods using criteria such as visual coherence and numerical indices like the Davies-Bouldin index and silhouette coefficient. The spatially constrained k-means algorithm exhibited superior performance compared to the evaluated alternatives. Visually, the algorithm produced clusters that closely mirrored the natural and human-made boundaries inherent in the datasets, thereby enhancing the interpretability and usability of the clustering results for urban planning, resource management, environmental monitoring, and market analysis. Numerically, the proposed method showed marked improvements in cluster cohesion and separation, outperforming SKATER, AZP, and traditional k-means algorithms in both Davies-Bouldin index and silhouette coefficient. This superior performance underscores the method's ability to maintain spatial contiguity without sacrificing attribute similarity, making it particularly valuable in applications such as geographic zoning and resource management, where understanding spatial distribution is crucial. By successfully incorporating spatial constraints into the k-means algorithm, this research provides a significant methodological enhancement for spatial data analysis within Geographic Information Systems (GIS). The proposed spatially constrained k-means algorithm not only improves the practicality and accuracy of spatial clustering but also offers new insights into the integration of spatial considerations in data clustering. The development of a comprehensive evaluation metric that includes spatial integrity as a component could further distinguish and enhance the assessment of spatial clustering methodologies, supporting more nuanced and effective spatial data analysis. This research paves the way for future studies to explore adaptive spatial constraints based on varying geographic and dataset-specific requirements, potentially leading to more refined and contextually appropriate clustering solutions.

外文关键词：

clustering analysisspatial data processingk-means algorithmgeographic information systemsspatial constraintsspatial partitioningclustering quality improvementdata science

作者：

丘铂钧、贾嘉楠、徐柱

展开 >

作者单位：

西南交通大学地球科学与工程学院,成都 611756

关键词：

聚类分析空间数据处理 k-means算法地理信息系统空间约束空间分区聚类质量改进数据科学

基金：

国家重点研发计划项目国家自然科学基金重大项目

项目编号：

2022YFB390420242394063

出版年：

2024

DOI：

10.20117/j.jsti.202403011

地理信息世界

中国地理信息产业协会黑龙江测绘地理信息局

地理信息世界

CSTPCD

影响因子：0.826

ISSN：1672-1586

年,卷(期)：2024.31(3)

参考文献量8