GEP_K-means Clustering Algorithm Based on MapReduce
In order to improve the computation efficiency of cluster center generation and fitness evaluation in K-means clustering algorithm based on Gene Expression Programming. Proposes a hybrid clustering algorithm of K-means and GEP based on MapReduce framework. As a distributional parallel programming model, MapReduce is used to parallel the computation of fitness evaluation in order to reduce process-ing time, and uses linear data structure to operated directly on chromosome genes in order to reduce the time and space complexities of genes expression to solve the cluster center. Verifies the algorithm on Hadoop by simulations. Experimental results show that the algo-rithm has high speedup and good stability, and no extra space overhead, fits to clustering analysis on massive data.
K-meansGene Expression Programming(GEP)MapReduceParallelMassive Data