面向数据规模可扩展的并行优化K-means算法
Parallel Optimization K-means Algorithm Facing the Data Size Scalable
李尧坤1
作者信息
- 1. 四川大学计算机学院,成都 610065
- 折叠
摘要
传统的K-means算法迭代过程中需要加载全部的聚类样本数据,并且更新类中心过程是非并行的。针对传统K-means算法处理数据规模小和类中心更新慢的问题,提出一种改进的K-means算法,面向解决K-means单台机器处理数据规模扩展问题,和处理器利用率低效问题。实验验证,该方法能够高效地处理大规模数据聚类。
Abstract
Traditional K-means algorithm need to load all the sample data into memory, and updating the class center is a non-parallel process. For the problem of the number of processing data is small and updating class centers with low speed in traditional K-means algorithm, pro-poses an improved K-means algorithm to solve the problems of processing data scale expansion and the processor utilization inefficient. Experiment shows the method can efficiently deal with large-scale data clustering.
关键词
K-means/大规模/更新类中心/并行Key words
K-means/Large-Scale/Updating Class Centers/Parallel引用本文复制引用
出版年
2015