针对传统异常用电检测在面临高维数据中的维数诅咒,以及不相关特征对异常检测的影响,造成检测精度低等问题,提出了一种基于无监督密度子空间选择的孤立森林检测算法.首先,提出了一种有效的基于密度的紧凑数据表示方法,提高了子空间选择策略的效率.然后,应用最小冗余-最大相关-密度准则(min-redundancy-maximum-relevance-to-density,mRMRD),用于选择基于互信息的相关子空间.最后,在相关子空间中构建隔离树并集成孤立森林,实现对异常用电数据的检测.通过实验分析,与传统检测算法相比,所提方法在准确率、ROC曲线下面积(area under curve,AUC)、F1指标上均有提升,提高了异常用电检测的效果.同时,灵敏性分析也验证了无监督密度子空间孤立森林检测算法的有效性.
Anomalous Electricity Usage Detection Based on Density Subspace Isolated Forest
To address the problems of traditional anomalous power usage detection in the face of the curse of dimensionality in high-dimensional data,and the impact of irrelevant features on anomaly detection,resulting in low detection accuracy.An isolated forest detection algorithm based on unsupervised density subspace selection was proposed.First,an effective density-based compact data representation method was proposed to improve the efficiency of the subspace selection strategy.Then,the minimum redundancy-maximum correlation-density criterion(mRMRD)was applied to select the relevant mutual information-based subspaces.Finally,isolated trees are constructed and isolated forests were integrated in the correlated subspaces to achieve the detection of abnormal electricity consumption data.Through experimental analysis,the proposed method improves the accuracy,area under the ROC curve(area under curve,AUC),and F1 indexes compared with the traditional detection algorithm,which effectively improves the abnormal electricity usage detection.Meanwhile,the sensitivity analysis also verifies the effectiveness of the unsupervised density subspace isolated forest detection algorithm.