首页|基于MapReduce的GEP_K均值聚类算法

基于MapReduce的GEP_K均值聚类算法

扫码查看
针对基于基因表达式编程的K均值聚类算法(GEP_K均值)中聚类中心生成和适应度评价环节的计算效率较低的问题,提出一种基于MapReduce框架的GEP_K均值聚类算法。采用MapReduce分布式并行编程模式,对适应度评价环节进行并行化改进,以减少算法处理时间,借助线性数据结构直接操作染色体基因,以降低染色体基因表达求解生成聚类中心的时间和空间复杂度,并在Hadoop平台上通过仿真实验对算法的性能进行验证。实验结果表明,该算法获得了较好的加速比和可扩展性,且无需额外空间开销,适用于聚类数未知的大规模数据集的聚类分析。
GEP_K-means Clustering Algorithm Based on MapReduce
In order to improve the computation efficiency of cluster center generation and fitness evaluation in K-means clustering algorithm based on Gene Expression Programming. Proposes a hybrid clustering algorithm of K-means and GEP based on MapReduce framework. As a distributional parallel programming model, MapReduce is used to parallel the computation of fitness evaluation in order to reduce process-ing time, and uses linear data structure to operated directly on chromosome genes in order to reduce the time and space complexities of genes expression to solve the cluster center. Verifies the algorithm on Hadoop by simulations. Experimental results show that the algo-rithm has high speedup and good stability, and no extra space overhead, fits to clustering analysis on massive data.

K-meansGene Expression Programming(GEP)MapReduceParallelMassive Data

古凌岚

展开 >

广东轻工职业技术学院计算机工程系,广州 510300

K均值 基因表达式编程 MapReduce 并行 大数据集

广东省档案局科研技项目

YDK-95-2014

2015

现代计算机(普及版)
中山大学

现代计算机(普及版)

影响因子:0.202
ISSN:1007-1423
年,卷(期):2015.(7)
  • 6