基于MapReduce的GEP_K均值聚类算法

扫码查看

原文链接

NETL
NSTL
万方数据
维普

中文摘要：针对基于基因表达式编程的K均值聚类算法（GEP_K均值）中聚类中心生成和适应度评价环节的计算效率较低的问题，提出一种基于MapReduce框架的GEP_K均值聚类算法。采用MapReduce分布式并行编程模式，对适应度评价环节进行并行化改进，以减少算法处理时间，借助线性数据结构直接操作染色体基因，以降低染色体基因表达求解生成聚类中心的时间和空间复杂度，并在Hadoop平台上通过仿真实验对算法的性能进行验证。实验结果表明，该算法获得了较好的加速比和可扩展性，且无需额外空间开销，适用于聚类数未知的大规模数据集的聚类分析。

外文标题：GEP_K-means Clustering Algorithm Based on MapReduce

外文摘要：In order to improve the computation efficiency of cluster center generation and fitness evaluation in K-means clustering algorithm based on Gene Expression Programming. Proposes a hybrid clustering algorithm of K-means and GEP based on MapReduce framework. As a distributional parallel programming model, MapReduce is used to parallel the computation of fitness evaluation in order to reduce process-ing time, and uses linear data structure to operated directly on chromosome genes in order to reduce the time and space complexities of genes expression to solve the cluster center. Verifies the algorithm on Hadoop by simulations. Experimental results show that the algo-rithm has high speedup and good stability, and no extra space overhead, fits to clustering analysis on massive data.

外文关键词：

K-meansGene Expression Programming(GEP)MapReduceParallelMassive Data

作者：

古凌岚

展开 >

作者单位：

广东轻工职业技术学院计算机工程系，广州 510300

关键词：

K均值基因表达式编程 MapReduce 并行大数据集

基金：

广东省档案局科研技项目

项目编号：

YDK-95-2014

出版年：

2015

DOI：

10.3969/j.issn.1007-1423.2015.20.003

现代计算机(普及版)

中山大学

现代计算机(普及版)

影响因子：0.202

ISSN：1007-1423

年,卷(期)：2015.(7)

参考文献量6