面向数据规模可扩展的并行优化K-means算法

扫码查看

原文链接

NETL
NSTL
万方数据
维普

中文摘要：传统的K-means算法迭代过程中需要加载全部的聚类样本数据，并且更新类中心过程是非并行的。针对传统K-means算法处理数据规模小和类中心更新慢的问题，提出一种改进的K-means算法，面向解决K-means单台机器处理数据规模扩展问题，和处理器利用率低效问题。实验验证，该方法能够高效地处理大规模数据聚类。

外文标题：Parallel Optimization K-means Algorithm Facing the Data Size Scalable

外文摘要：Traditional K-means algorithm need to load all the sample data into memory, and updating the class center is a non-parallel process. For the problem of the number of processing data is small and updating class centers with low speed in traditional K-means algorithm, pro-poses an improved K-means algorithm to solve the problems of processing data scale expansion and the processor utilization inefficient. Experiment shows the method can efficiently deal with large-scale data clustering.

外文关键词：

K-meansLarge-ScaleUpdating Class CentersParallel

作者：

李尧坤

展开 >

作者单位：

四川大学计算机学院，成都 610065

关键词：

K-means 大规模更新类中心并行

出版年：

2015

DOI：

10.3969/j.issn.1007-1423.2015.02.001

现代计算机(普及版)

中山大学

现代计算机(普及版)

影响因子：0.202

ISSN：1007-1423

年,卷(期)：2015.(1)

参考文献量1