首页|基于JS散度的不确定数据密度峰值聚类算法

基于JS散度的不确定数据密度峰值聚类算法

扫码查看
针对传统的基于密度的不确定性聚类算法存在参数敏感和对复杂流形不确定数据集得到聚类结果较差的缺陷,提出一种新的基于JS散度的不确定数据密度峰值聚类算法(UDPC-JS).该算法首先用不确定自然邻居定义的不确定自然邻域密度因子去除噪声点;其次,通过不确定自然邻居和JS散度相结合的方式计算不确定数据对象的局部密度,通过结合代表点的思想找到不确定数据集的初始聚类中心,并在初始聚类中心之间定义基于JS散度和图的距离;然后,再利用基于不确定自然邻居和JS散度计算出的局部密度和在初始聚类中心之间新定义的基于JS散度和图的距离在初始聚类中心上构建决策图,并根据决策图选择最终的聚类中心;最后,将未分配的不确定数据对象分配到其初始聚类中心所在的簇中.实验结果表明:该算法较对比算法具有更好的聚类效果和准确性,并且在处理复杂流形的不确定数据集上的优势较大.
Peak clustering algorithm for uncertain data density based on JS divergence
Aiming at the defects of traditional density-based uncertain clustering algorithm,such as parameter sensitivity and poor clustering results for complex manifold uncertain data sets,a new uncertain data density peak clustering algorithm based on JS divergence(UDPC-JS)was proposed.The algorithm first uses the uncertain natural neighborhood density factor defined by uncertain natural neighbors to remove noise points;secondly,the local density of uncertain data objects is calculated by combining uncertain natural neighbors and JS divergence.Then,the initial clustering center of uncertain data sets is found by combining the idea of representative points,and the distance based on JS divergence and graph is defined between the initial clustering centers.Then,the local density calculated based on uncertain natural neighbors and JS divergence and the newly defined distance based on JS divergence and graph between the initial clustering centers are used to construct the decision graph on the initial clustering center,and the final clustering center is selected according to the decision graph.Finally,the unassigned uncertain data objects are assigned to the cluster where their initial clustering centers are located.The experimental results show that the algorithm has better clustering effect and accuracy than the comparison algorithm and has greater advantages in dealing with uncertain data sets of complex manifolds.

uncertain datauncertain natural neighborsJS divergencedensity peakclustering

李松、刘晓楠、刘娟

展开 >

哈尔滨理工大学 计算机科学与技术学院,哈尔滨 150080

奇安信科技集团股份有限公司 战略研究部,北京 100088

不确定数据 不确定自然邻居 JS散度 密度峰 聚类

国家自然科学基金项目黑龙江省重点研发计划项目国家重点研发计划项目

620721362022ZX01A342020YFB1710200

2024

吉林大学学报(工学版)
吉林大学

吉林大学学报(工学版)

CSTPCD北大核心
影响因子:0.792
ISSN:1671-5497
年,卷(期):2024.54(7)
  • 2