首页|基于密度峰值的数据流动态聚类算法研究

基于密度峰值的数据流动态聚类算法研究

扫码查看
数据流中存在不确定性,如何识别数据流环境中任意形状的数据以及噪声影响问题引起了广泛关注.为解决上述问题,设计一种鲁棒的密度峰值数据流动态聚类算法,该聚类算法的框架包括在线和离线阶段,在线阶段旨在即时响应并处理连续到达的数据,在线阶段通过设计微簇的不均匀衰减策略减少历史数据对聚类的影响,和根据样本到微簇距离动态地对样本加权.离线阶段在密度峰值聚类的基础上设计基于最近邻域自适应的局部密度计算方法,降低密度峰值算分配阶段的"多米诺效应"影响.该方法能够进行复杂的数据处理,不受有限内存影响,有较好的鲁棒性.对人工数据集和真实数据集进行对比实验,实验结果表明该算法优于其他算法.所提出的鲁棒的密度峰值数据流动态聚类算法能给出更好的聚类效果.
Research on Data Stream Dynamic Clustering Algorithm Based on Density Peaks
The uncertainty in data stream and the challenge of identifying data of arbitrary shapes and noise in such environ-ments which have attracted widespread attention are addressed by proposing a robust density peak-based dynamic clustering algorithm for data stream. The clustering algorithm framework consists of online and offline stages,where the online stage aims to respond instantly and process continuously arriving data. It reduces the influence of historical data on clustering by designing an uneven decay strategy for micro-clusters and dynamically weighting samples based on their distances to micro-clusters. In the offline stage which is based on density peak clustering a locally adaptive density calculation method is introduced to mitigate the'domino effect 'during the density peak assignment phase. This method can handle complex data processing which is not affected by limited memory and exhibits good robustness. Comparative experiments are conducted on both artificial and real datasets,with the experimental results demonstrating the superiority of the proposed algorithm over others. The robust density peak-based dynamic clustering algorithm for data stream presented in this paper yields better clustering performance.

data streamclusteringnon-uniform decay strategynearest neighborhooddensity peaks

张国一、刘三民

展开 >

安徽工程大学 计算机与信息学院,芜湖 241000

数据流 聚类 不均匀衰减策略 最近邻域 密度峰值

安徽省高等学校自然科学研究重点项目安徽省自然科学基金

KJ2021A05162108085MF213

2024

长春理工大学学报(自然科学版)
长春理工大学

长春理工大学学报(自然科学版)

CSTPCD
影响因子:0.432
ISSN:1672-9870
年,卷(期):2024.47(4)