Cloud computing based cluster analysis on data of power utilization
Aiming at the issues of huge amount of electricity data and low clustering efficiency in data mining,this paper adopted the method of theoretical analysis and experiment,analyzed the electricity data parallel study of the architecture and studied the Canopy and K-means two typical clustering algorithms.This study proposed a new clustering approach.The approach use the Canopy to rough handling of electricity data and get the cluster number and cluster center,then use K-means clustering precision.The approach both use the K-means the advantage of simple algorithm and fast convergence speed,and make it not easy to fall into local optimum.In order to reach the goal of dealing with huge amounts of data,the proposed algorithm was set on the MapReduce frame.The results show that the proposed algorithm is efficient and feasible in huge amounts of electricity data processing,and has a good speedup ratio.