首页|基于局部密度峰和标签传播的最小生成树聚类

基于局部密度峰和标签传播的最小生成树聚类

扫码查看
基于最小生成树(minimum spanning tree,MST)的聚类算法能够识别具有任意形状的簇,该算法在如何有效构建最小生成树和识别无效边方面存在不足,而且易受到噪声点影响.本文利用密度峰值聚类算法思想的优点来寻找局部密度峰,局部密度峰在保留原始数据集分布结构的同时,排除了噪声点,因此,将局部密度峰与最小生成树聚类算法相结合,采用标签传播,提出了基于局部密度峰和标签传播的最小生成树聚类算法(DPMST).该算法采用了局部密度峰之间基于共享邻的距离,利用局部密度峰之间的邻域信息,有效构造最小生成树和识别无效边,使算法能够发现具有复杂结构的簇.标签传播增强强标签,削弱弱标签,以细化错误的标签,特别是对于边界点以及揭示复杂流形,能够提高聚类结果的质量.人工和真实数据集上的实验结果表明,与经典聚类算法DPC、MST、K-means、DBSCAN、AP、SC和BIRCH比较,DPMST算法表现优异.
Minimum Spanning Tree Clustering Based on Local Density Peak and Label Propagation
Clustering algorithm based on the minimum spanning tree(MST)can identify clusters with arbitrary shapes,but the algorithm has limitations in efficiently constructing a minimum spanning tree and identifying invalid edges and is easily influenced by noise points.This study proposes an MST clustering algorithm based on local density peaks and label propagation(DPMST)by combining the advantages of the density peaks clustering algorithm to find local density peaks and exclude noise points with the MST algorithm.The DPMST algorithm adopts the shared neighbors-based distance between local density peaks and uses the neighborhood information between local density peaks to efficiently construct minimum spanning trees and identify invalid edges,enabling the discovery of clusters with complex structures.Label propagation is used to enhance the strong labels and weaken the weak labels to refine wrong labels,which can improve the quality of clustering results,especially for border region points as well as revealing complex manifolds.The experimental results on several synthetic and real-world datasets show that the DPMST algorithm outperforms classical clustering algorithms DPC,MST,K-means,DBSCAN,AP,SC,and BIRCH.

local density peakminimum spanning tree(MST)label propagationclustering

林钰莹、侯新民

展开 >

中国科学技术大学大数据学院,合肥 230026

中国科学技术大学数学科学学院,合肥 230026

中国科学院吴文俊数学重点实验室,合肥 230026

合肥国家实验室,合肥 230088

展开 >

局部密度峰 最小生成树 标签传播 聚类

国家自然科学基金量子通信与量子计算机重大项目

120714532021ZD0302902

2024

计算机系统应用
中国科学院软件研究所

计算机系统应用

CSTPCD
影响因子:0.449
ISSN:1003-3254
年,卷(期):2024.33(8)