DMKK-means——一种深度多核K-means聚类算法

DMKK-means:a deep multiple kernel K-means clustering algorithm

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：针对传统K-means的聚类效果容易受到样本分布影响,且核函数表示能力不强导致对于复杂问题的聚类效果表现不佳的问题,利用深度核的强表示性并通过多核集成方式,提出一种具有强表示能力且分布鲁棒的深度多核 K-means(deep multiple kernel K-means,DMKK-means)聚类算法.构建具有强表示能力的深度多核网络架构,在新的特征空间进行K-means聚类;基于Kullback-Leibler(KL)散度的聚类损失函数衡量该算法与2 种基准聚类方法的差异;将该聚类算法建模成高效的端到端学习问题,利用随机梯度下降算法更新优化深度多核网络的权重参数.在多个标准数据集上进行试验,结果表明,相比于K-means、径向基函数核K-means(radial basis function kernel K-means,RBFKKM)及其他多核K-means聚类算法,该算法在聚类精度、归一化互信息和调整兰德系数指标上均有明显提升,验证该算法的可行性与有效性.

外文摘要：The proposed algorithm,deep multiple kernel K-means(DMKK-means),addressed the limitations of traditional K-means clustering,which was sensitive to sample distribution and exhibited suboptimal performance for complex problems due to its limited expressive power of kernel representations.By leveraging the strong representational capability of deep kernels and employing a multi-kernel ensemble approach,DMKK-means constructed a highly expressive deep multiple kernel network architecture and per-formed K-means clustering in a new feature space.The dissimilarity between this algorithm and two baseline clustering methods was quantified using a clustering loss function based on Kullback-Leibler(KL)divergence.The clustering algorithm was modeled as an efficient end-to-end learning problem,and the weight parameters of the deep multiple kernel network were optimized through sto-chastic gradient descent.Experimental results on multiple standard datasets demonstrated the superiority of the proposed algorithm over K-means,radial basis function kernel K-means(RBFKKM),and other multi-kernel K-means clustering algorithms in terms of clustering accuracy,normalized mutual information,and adjusted rand index.These findings validated the feasibility and effectiveness of the proposed algorithm.

外文关键词：

K-meanskernel clusteringdeep multiple kernel learningdata mininggradient descent

作者：

王梅、宋凯文、刘勇、王志宝、万达

展开 >

作者单位：

东北石油大学计算机与信息技术学院,黑龙江大庆 163318

黑龙江省石油大数据与智能分析重点实验室,黑龙江大庆 163318

中国人民大学高瓴人工智能学院,北京 100049

大数据管理与分析方法研究北京市重点实验室(中国人民大学信息学院),北京 100049

展开 >

关键词：

K-means 核聚类深度多核学习数据挖掘梯度下降

出版年：

2024

DOI：

10.6040/j.issn.1672-3961.0.2023.157

山东大学学报(工学版)

山东大学

山东大学学报(工学版)

CSTPCD北大核心

影响因子：0.634

ISSN：1672-3961

年,卷(期)：2024.54(6)