首页|面向密度分布不均数据的近邻优化密度峰值聚类算法

面向密度分布不均数据的近邻优化密度峰值聚类算法

扫码查看
密度分布不均数据是指类簇间样本分布疏密程度不同的数据。密度峰值聚类(DPC)算法在处理密度分布不均数据时,倾向于在密度较高区域内找到类簇中心,并易将稀疏类簇的样本分配给密集类簇。为避免上述缺陷,提出一种面向密度分布不均数据的近邻优化密度峰值聚类(DPC-NNO)算法。DPC-NNO算法结合逆近邻和k近邻定义新的局部密度,提高稀疏样本的局部密度,使算法能更准确地找到类簇中心;定义分配策略时引入共享近邻,计算样本间相似性,构造相似矩阵,使同一类簇样本联系更紧密,避免错误分配样本。将所提出的DPC-NN O算法与IDPC-FA、DPCSA、FNDPC、FKNN-DPC、DPC算法进行对比,实验结果表明,DPC-NN O算法在处理密度分布不均数据时能获得优异的聚类效果,对于复杂数据集和UCI数据集,DPC-NNO算法的综合性能优于对比算法。
Density peaks clustering algorithm with nearest neighbor optimization for data with uneven density distribution
Data with uneven density distribution are those where the distribution of samples varies in sparsity between class clusters.When dealing with uneven density datasets,the density peak clustering(DPC)algorithm tends to find the center of class clusters in the higher density area and assign samples from sparse class clusters to dense class clusters.To avoid these defects,this paper proposes a density peaks clustering algorithm with nearest neighbor optimization(DPC-NNO)for data with uneven density distribution.The DPC-NNO algorithm combines the reverse nearest neighbor and k-nearest neighbor to define a new local density that improves the local density of sparse samples,allowing the algorithm to find class cluster centers more accurately;shared nearest neighbors are introduced to define the assignment strategy to calculate the similarity between samples and construct a similarity matrix to make the samples of the same class clusters more closely related and avoid the wrong assignment of samples.In this paper,we compare the DPC-NNO algorithm with IDPC-FA,DPCSA,FNDPC,FKNN-DPC,and DPC algorithms.Experimental results show that the DPC-NNO algorithm can achieve excellent clustering results on uneven density datasets,and the comprehensive performance of the DPC-NNO algorithm is better than other comparison algorithms on complex datasets and UCI datasets.

density peaksclustering analysisuneven density distributionreverse nearest neighborshare nearest neighborsimilarity of samples

陈蔚昌、赵嘉、肖人彬、王晖、崔志华

展开 >

南昌工程学院信息工程学院,南昌 330099

华中科技大学人工智能与自动化学院,武汉 430074

太原科技大学计算机科学与技术学院,太原 030024

密度峰值 聚类分析 密度分布不均 逆近邻 共享近邻 样本相似性

国家自然科学基金国家自然科学基金科技创新2030"新一代人工智能"重大项目

52069014516690142018AAA0101200

2024

控制与决策
东北大学

控制与决策

CSTPCD北大核心
影响因子:1.227
ISSN:1001-0920
年,卷(期):2024.39(3)
  • 28