首页|侵蚀聚类

侵蚀聚类

扫码查看
基于密度的聚类是一种经典的聚类分析方法,它能够在不指定类簇数目的情况下发现非球形类簇.但真实复杂数据集中存在类簇边界模糊、数据密度不均、数据分布复杂等问题.当前,能够同时应对这三种问题的研究工作相对较少.对此,本文从自然世界的侵蚀现象中汲取灵感,提出侵蚀聚类(Erosion Clustering,EC)算法.本算法引入动态密度估计方法和侵蚀策略,逐层识别和剔除位于类簇边界上的数据,进而发现各个类簇潜在的核心区域;采用基于互可达图的聚类方法实现核心区域的聚类;设计基于局部密度峰值的分配方式完成边界数据的划分.在12个基准数据集上的实验结果表明,EC算法的聚类性能比7种对比算法分别在修正兰德指标、修正互信息、F1分数上平均提高了96%、53%和36%.
Erosion Clustering
Density-based clustering is a classical algorithm in cluster analysis,which can find non-spherical clusters without specifying the number of clusters in advance.In the real-world scene,there are still some issues,including unclear boundaries between clusters,varying densities of data,and complex cluster shapes.Most existing density-based clustering algorithms do not tackle these problems in a unified way.We counter this difficulty by taking inspiration from the natural erosion phenomenon to present erosion clustering(EC).Firstly,the proposed dynamic density evaluation method is integrated into the erosion strategy,which identifies and removes the data on the cluster boundary layer by layer,revealing the cores of the latent clusters.After that,a mutual-reachability-graph-based clustering is used to group the core data.Finally,the allocation strategy based on the local density peak is designed to associate the eroded data to different clusters.The experimental results on 12 benchmark datasets demonstrate that the clustering performance of the proposed EC algrithm is improved by an average of 96%,53%,and 36%in the adjusted Rand index,adjusted mutual information,and F1 score,respectively,comparing with the other seven algrithms.

density-based clusteringcluster analysisdensity estimationlocal density peakmutual k-nearest neigh-borerosion strategy

杜明晶、吴福玉、李宇蕊、董永权

展开 >

江苏师范大学计算机与科学技术学院,江苏 徐州 221116

江苏师范大学江苏省高校教育智能技术重点实验室,江苏 徐州 221116

密度聚类 聚类分析 密度估计 局部密度峰值 互k近邻 侵蚀策略

国家自然科学基金国家自然科学基金

6200610461872168

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(10)