计算机仿真2024,Vol.41Issue(6) :448-454.

基于密度峰值的进化数据流聚类算法

Evolving Data Stream Clustering Algorithm Based on Density Peaks

翁佳桥 吕莉 樊棠怀 康平
计算机仿真2024,Vol.41Issue(6) :448-454.

基于密度峰值的进化数据流聚类算法

Evolving Data Stream Clustering Algorithm Based on Density Peaks

翁佳桥 1吕莉 1樊棠怀 1康平1
扫码查看

作者信息

  • 1. 南昌工程学院信息工程学院,江西 南昌 330099
  • 折叠

摘要

针对现有数据流聚类算法聚类精度低、不能检测数据流簇进化等问题,提出一种基于密度峰值的进化数据流聚类(DPStream)算法.DPStream采用在线微聚类与离线宏聚类两阶段处理框架,引入密度衰减反映数据流近期演化信息;在线微聚类阶段借助核心微簇与潜在微簇反映簇的生成、进化和衰退,通过生成和维护机制对微簇进行增量维护;当用户聚类请求到来时,使用密度峰值聚类算法进行离线宏聚类,找出核心微簇的簇中心,将剩余核心微簇分配给相应的簇中心所在簇,得到最终的聚类结果.DPStream算法能在数据流的任意时间给出聚类结果,聚类数据流的聚类纯度在小窗口时保持在95%以上,能高质量、高响应的完成数据流聚类.

Abstract

Aiming at the problems of low clustering accuracy and unable to detect data stream cluster evolution of existing data stream clustering algorithms,an evolving data stream clustering algorithm based on density peaks(DP-Stream)is proposed.The DPStream adopts a two-stage processing framework of online micro-clustering and offline macro-clustering.The density decay is introduced to reflect recent evolution information of data stream.During the online micro clustering stage,core and potential micro clusters are utilized to reflect the generation,evolution,and de-cline of clusters,and incremental maintenance is carried out on micro clusters through generation and maintenance mechanisms;When a user clustering request arrives,the density peak clustering algorithm is used for offline macro clustering to identify the cluster centers of the core micro clusters,allocate the remaining core micro clusters to the corresponding cluster centers,and obtain the final clustering result.The DPStream can get clustering results at any time of the data stream.The clustering purity of the clustered data stream is kept above 95%in the small window,which can complete the data stream clustering with high quality and high response.

关键词

密度峰值聚类/数据流/两阶段框架/微簇/簇进化/密度衰减

Key words

Density peaks clustering/Data stream/Two-stage framework/Micro cluster/Evolving cluster/Density decay

引用本文复制引用

基金项目

国家自然科学基金资助项目(62066030)

江西省教育厅科技计划项目(GJJ201915)

江西省重点研发计划项目(20192BBE50076)

江西省重点研发计划项目(20203BBGL73225)

出版年

2024
计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
段落导航相关论文