首页|面向密度分布不均数据的加权逆近邻密度峰值聚类算法

面向密度分布不均数据的加权逆近邻密度峰值聚类算法

扫码查看
针对密度分布不均数据,密度峰值聚类算法易忽略类簇间样本的疏密差异,导致误选类簇中心;分配策略易将稀疏区域的样本误分到密集区域,导致聚类效果不佳的问题,本文提出一种面向密度分布不均数据的加权逆近邻密度峰值聚类算法.该算法首先在局部密度公式中引入基于sigmoid函数的权重系数,增加稀疏区域样本的权重,结合逆近邻思想,重新定义了样本的局部密度,有效提升类簇中心的识别率;其次,引入改进的样本相似度策略,利用样本间的逆近邻及共享逆近邻信息,使得同一类簇样本间具有较高的相似度,可有效改善稀疏区域样本分配错误的问题.在密度分布不均、复杂形态和UCI数据集上的对比实验表明,本文算法的聚类效果优于IDPC-FA、FNDPC、FKNN-DPC、DPC和DPCSA算法.
Density peak clustering algorithm based on weighted reverse nearest neighbor for uneven density datasets
For data with uneven density distribution,the density peak clustering algorithm disregards the sparsity differ-ence among intercluster samples,causing an inaccurate selection of the cluster center.Moreover,the allocation strategy easily divides the samples in sparse areas into dense areas by mistake,leading to a poor clustering effect.Therefore,the density peak clustering algorithm based on the weighted reverse nearest neighbor(DPC-WR)against datasets with un-even density distribution is proposed in this paper.First,the weight coefficient based on the sigmoid function is intro-duced to the local density formula to increase the weight of samples in sparse areas.Combined with the concept of re-verse nearest neighbor,the local density of samples is then redesigned to improve the recognition rate of cluster centers effectively.Second,an improved sample similarity strategy is introduced,which utilizes reverse nearest neighbors and shares this neighbor's information between samples to increase the similarity of samples in the same cluster.This effect-ively solves the problem of sample allocation error in sparse areas.Experiments on uneven density distribution,com-plex morphology,and UCI datasets show that the clustering effect of the DPC-WR algorithm outperforms that of IDPC-FA,FNDPC,FKNN-DPC,DPC,and DPCSA algorithms.

density peak clusteringuneven density distributionreverse nearest neighborshared reverse nearest neigh-borsample similaritylocal densitydistribution strategydata mining

吕莉、陈威、肖人彬、韩龙哲、谭德坤

展开 >

南昌工程学院 信息工程学院, 江西 南昌 330099

南昌工程学院 南昌市智慧城市物联感知与协同计算重点实验室, 江西 南昌 330099

华中科技大学 人工智能与自动化学院, 湖北 武汉 430074

密度峰值聚类 密度分布不均 逆近邻 共享逆近邻 样本相似度 局部密度 分配策略 数据挖掘

国家自然科学基金江西省重点研发计划江西省重点研发计划江西省教育厅科技项目

6206603020192BBE5007620203BBGL73225GJJ190958

2024

智能系统学报
中国人工智能学会 哈尔滨工程大学

智能系统学报

CSTPCD北大核心
影响因子:0.672
ISSN:1673-4785
年,卷(期):2024.19(1)
  • 35