基于密度峰值的数据流动态聚类算法研究

Research on Data Stream Dynamic Clustering Algorithm Based on Density Peaks

张国一 ¹刘三民¹

扫码查看

作者信息

1. 安徽工程大学计算机与信息学院,芜湖 241000
折叠

摘要

数据流中存在不确定性,如何识别数据流环境中任意形状的数据以及噪声影响问题引起了广泛关注.为解决上述问题,设计一种鲁棒的密度峰值数据流动态聚类算法,该聚类算法的框架包括在线和离线阶段,在线阶段旨在即时响应并处理连续到达的数据,在线阶段通过设计微簇的不均匀衰减策略减少历史数据对聚类的影响,和根据样本到微簇距离动态地对样本加权.离线阶段在密度峰值聚类的基础上设计基于最近邻域自适应的局部密度计算方法,降低密度峰值算分配阶段的"多米诺效应"影响.该方法能够进行复杂的数据处理,不受有限内存影响,有较好的鲁棒性.对人工数据集和真实数据集进行对比实验,实验结果表明该算法优于其他算法.所提出的鲁棒的密度峰值数据流动态聚类算法能给出更好的聚类效果.

Abstract

The uncertainty in data stream and the challenge of identifying data of arbitrary shapes and noise in such environ-ments which have attracted widespread attention are addressed by proposing a robust density peak-based dynamic clustering algorithm for data stream. The clustering algorithm framework consists of online and offline stages,where the online stage aims to respond instantly and process continuously arriving data. It reduces the influence of historical data on clustering by designing an uneven decay strategy for micro-clusters and dynamically weighting samples based on their distances to micro-clusters. In the offline stage which is based on density peak clustering a locally adaptive density calculation method is introduced to mitigate the'domino effect 'during the density peak assignment phase. This method can handle complex data processing which is not affected by limited memory and exhibits good robustness. Comparative experiments are conducted on both artificial and real datasets,with the experimental results demonstrating the superiority of the proposed algorithm over others. The robust density peak-based dynamic clustering algorithm for data stream presented in this paper yields better clustering performance.

关键词

数据流/聚类/不均匀衰减策略/最近邻域/密度峰值

Key words

data stream/clustering/non-uniform decay strategy/nearest neighborhood/density peaks

引用本文复制引用

基金项目

安徽省高等学校自然科学研究重点项目(KJ2021A0516)

安徽省自然科学基金(2108085MF213)

出版年

2024

长春理工大学学报(自然科学版)

长春理工大学

长春理工大学学报(自然科学版)

CSTPCD

影响因子：0.432

ISSN：1672-9870

段落导航