首页|海量用电数据并行聚类分析

海量用电数据并行聚类分析

扫码查看
针对用电数据量大、用电数据挖掘效率低等问题,采用理论分析和实验的方法,进行用电数据并行分析构架的研究,研究了Canopy和K-means两种典型的聚类算法,提出一种新的聚类思路,使用Canopy先对用电数据进行粗略处理,得到聚类个数和聚类中心,再用K-means精确聚类,既利用了K-means算法简单、收敛速度快的优势,又使其不容易陷入局部最优.为达到处理海量数据的目的,把提出的算法部署到MapReduce框架上进行实验.研究结果表明:提出的算法在海量用电数据的处理方面高效可行,并且具有良好的加速比.
Cloud computing based cluster analysis on data of power utilization
Aiming at the issues of huge amount of electricity data and low clustering efficiency in data mining,this paper adopted the method of theoretical analysis and experiment,analyzed the electricity data parallel study of the architecture and studied the Canopy and K-means two typical clustering algorithms.This study proposed a new clustering approach.The approach use the Canopy to rough handling of electricity data and get the cluster number and cluster center,then use K-means clustering precision.The approach both use the K-means the advantage of simple algorithm and fast convergence speed,and make it not easy to fall into local optimum.In order to reach the goal of dealing with huge amounts of data,the proposed algorithm was set on the MapReduce frame.The results show that the proposed algorithm is efficient and feasible in huge amounts of electricity data processing,and has a good speedup ratio.

K-means algorithmcanopy algorithmcloud computingMapReduce framecluster

刘晓悦、郭强

展开 >

华北理工大学电气工程学院,河北唐山063009

K-means算法 Canopy算法 云计算 MapReduce框架 聚类

2016

辽宁工程技术大学学报(自然科学版)
辽宁工程技术大学

辽宁工程技术大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.722
ISSN:1008-0562
年,卷(期):2016.35(1)
  • 5
  • 9