首页|基于自编码的改进K-means光伏能源数据清洗方法

基于自编码的改进K-means光伏能源数据清洗方法

扫码查看
智能电网的发展带来了海量能源数据,数据质量是开展数据价值挖掘等任务的基础.然而,多源海量光伏能源数据的采集与传输过程中不可避免地存在异常数据,因此需要进行数据清洗.目前,基于传统统计机器学习的数据清洗模型存在一定的局限性.文中提出了一种基于Transformer自编码结构的改进型K-means聚类模型,用于能源大数据清洗.该模型通过肘部法则自适应地确定聚类簇数,并利用自编码网络对聚类内数据进行压缩和重构,从而实现异常数据的检测和恢复.同时,模型利用Transformer的多头注意力机制学习数据间的相关特征,提高了对异常数据的筛查能力.在光伏发电公开数据集上的实验证明,与其他方法相比,该模型具有更好的异常数据检测效果,筛查准确率可达96%以上.此外,所提模型能在一定程度上恢复异常数据,为能源大数据应用提供了有效的支持.
Improved K-means Photovoltaic Energy Data Cleaning Method Based on Autoencoder
The development of smart grids has brought about a massive amount of energy data,and data quality is the foundation for tasks such as data value mining.However,during the collection and transmission process of large-scale photovoltaic energy data from multiple sources,it is inevitable to encounter abnormal data,thus requiring data cleaning.Currently,traditional statisti-cal machine learning-based data cleaning models have certain limitations.This paper proposes an improved K-means clustering model based on the Transformer autoencoder structure for energy big data cleaning.It adaptively determines the number of clus-ters using the elbow method and utilizes autoencoder networks to compress and reconstruct data within clusters,thereby detecting and recovering abnormal data.Additionally,the proposed model employs the multi-head attention mechanism of Transformer to learn the relevant features among the data,enhancing the screening capability for abnormal data.Experimental results on a public-ly available photovoltaic power generation dataset demonstrate that,compared to other methods,the proposed model achieves bet-ter performance in detecting abnormal data,with a screening accuracy of over 96%.Moreover,it is capable of recovering abnormal data to a certain extent,providing effective support for the application of energy big data.

AutoencoderData cleaningAnomaly detectionTransformerK-means

彭勃、李耀东、龚贤夫

展开 >

广东电网有限责任公司电网规划研究中心 广州 510080

自编码 数据清洗 异常检测 Transformer K-means

中国南方电网有限责任公司科技项目中国南方电网有限责任公司科技项目

037700KK52220042GDKJXM20220906

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(z1)
  • 19