面向密度分布不均数据的加权逆近邻密度峰值聚类算法

Density peak clustering algorithm based on weighted reverse nearest neighbor for uneven density datasets

吕莉 ¹陈威 ¹肖人彬 ²韩龙哲 ¹谭德坤¹

扫码查看

作者信息

1. 南昌工程学院信息工程学院, 江西南昌 330099;南昌工程学院南昌市智慧城市物联感知与协同计算重点实验室, 江西南昌 330099
2. 华中科技大学人工智能与自动化学院, 湖北武汉 430074
折叠

摘要

针对密度分布不均数据,密度峰值聚类算法易忽略类簇间样本的疏密差异,导致误选类簇中心;分配策略易将稀疏区域的样本误分到密集区域,导致聚类效果不佳的问题,本文提出一种面向密度分布不均数据的加权逆近邻密度峰值聚类算法.该算法首先在局部密度公式中引入基于sigmoid函数的权重系数,增加稀疏区域样本的权重,结合逆近邻思想,重新定义了样本的局部密度,有效提升类簇中心的识别率;其次,引入改进的样本相似度策略,利用样本间的逆近邻及共享逆近邻信息,使得同一类簇样本间具有较高的相似度,可有效改善稀疏区域样本分配错误的问题.在密度分布不均、复杂形态和UCI数据集上的对比实验表明,本文算法的聚类效果优于IDPC-FA、FNDPC、FKNN-DPC、DPC和DPCSA算法.

Abstract

For data with uneven density distribution,the density peak clustering algorithm disregards the sparsity differ-ence among intercluster samples,causing an inaccurate selection of the cluster center.Moreover,the allocation strategy easily divides the samples in sparse areas into dense areas by mistake,leading to a poor clustering effect.Therefore,the density peak clustering algorithm based on the weighted reverse nearest neighbor(DPC-WR)against datasets with un-even density distribution is proposed in this paper.First,the weight coefficient based on the sigmoid function is intro-duced to the local density formula to increase the weight of samples in sparse areas.Combined with the concept of re-verse nearest neighbor,the local density of samples is then redesigned to improve the recognition rate of cluster centers effectively.Second,an improved sample similarity strategy is introduced,which utilizes reverse nearest neighbors and shares this neighbor's information between samples to increase the similarity of samples in the same cluster.This effect-ively solves the problem of sample allocation error in sparse areas.Experiments on uneven density distribution,com-plex morphology,and UCI datasets show that the clustering effect of the DPC-WR algorithm outperforms that of IDPC-FA,FNDPC,FKNN-DPC,DPC,and DPCSA algorithms.

关键词

密度峰值聚类/密度分布不均/逆近邻/共享逆近邻/样本相似度/局部密度/分配策略/数据挖掘

Key words

density peak clustering/uneven density distribution/reverse nearest neighbor/shared reverse nearest neigh-bor/sample similarity/local density/distribution strategy/data mining

引用本文复制引用

基金项目

国家自然科学基金(62066030)

江西省重点研发计划(20192BBE50076)

江西省重点研发计划(20203BBGL73225)

江西省教育厅科技项目(GJJ190958)

出版年

2024

智能系统学报

中国人工智能学会　哈尔滨工程大学

智能系统学报

CSTPCD北大核心

影响因子：0.672

ISSN：1673-4785

参考文献量35

段落导航