大数据2024,Vol.10Issue(3) :133-148.DOI:10.11959/j.issn.2096-0271.2024007

基于三阶张量的大规模数据谱聚类集成算法

Spectral clustering ensemble algorithm based on three-order tensor for large-scale data

仵匀政 杜韬 周劲 陈迪 王心耕
大数据2024,Vol.10Issue(3) :133-148.DOI:10.11959/j.issn.2096-0271.2024007

基于三阶张量的大规模数据谱聚类集成算法

Spectral clustering ensemble algorithm based on three-order tensor for large-scale data

仵匀政 1杜韬 2周劲 2陈迪 1王心耕1
扫码查看

作者信息

  • 1. 济南大学信息科学与工程学院,山东 济南 250024
  • 2. 济南大学信息科学与工程学院,山东 济南 250024;山东省网络环境智能计算技术重点实验室,山东 济南 250024
  • 折叠

摘要

为了降低大规模数据谱聚类计算负担,进一步提高聚类的准确性和鲁棒性,提出了一种基于三阶张量的大规模数据谱聚类集成算法.首先,提出一种混合代表最近邻近似方法构造数据间的稀疏亲和子矩阵;然后将稀疏亲和子矩阵表示为二部图,通过图分割的方法得到初步聚类结果;最后,提出三阶张量集成方法,将多个聚类结果进行融合,得到最终的聚类结果.在大规模的真实数据集和合成数据集上验证,相较经典的谱聚类算法、聚类集成算法以及近年来对其改进的算法,该算法表现出更优异的性能.

Abstract

In order to reduce the computational burden of large-scale data spectral clustering and further improve the clustering accuracy and robustness, the spectral clustering ensemble algorithm based on the three-order tensor for large-scale data was proposed. The sparse affinity sub-matrix was first constructed by the mixed representative nearest neighbor approximation method. The sparse affinity sub-matrix was then represented as a bipartite graph. The preliminary clustering results were obtained by Graph Segmentation. Finally, an unified clustering result was obtained by fusing multiple clustering results through the three-order tensor ensemble method. On the real datasets and the synthetic datasets, the proposed algorithm showed a better performance compared to the classical spectral clustering algorithm, the clustering ensemble algorithm, and the improved algorithms in recent years.

关键词

数据聚类/大规模数据/谱聚类/三阶张量/聚类集成

Key words

data clustering/large-scale data/spectral clustering/three-order tensor/clustering ensemble

引用本文复制引用

基金项目

国家自然科学基金(62273164)

国家自然科学基金(61873324)

山东省自然科学基金(ZR2019MF040)

出版年

2024
大数据
人民邮电出版社

大数据

CSTPCD
ISSN:2096-0271
段落导航相关论文