首页|基于孤立森林的取水数据异常值检测

基于孤立森林的取水数据异常值检测

扫码查看
为快速准确地检测出供水企业取水量数据的异常值,提出了基于孤立森林的无监督学习算法,以安徽省水资源取水监测平台提供的A~D四个供水企业取水量数据为例,并通过试验将其与传统箱线图法和有监督学习的k近邻算法进行比较.结果表明,基于孤立森林的无监督学习算法因其独特的树状结构,使其在进行点异常值检测时平均F1、AAUC值分别达到0.963 0、0.998 0,较k近邻算法分别高约56.40%、22.47%,较箱线图法分别高约18.92%、9.70%.虽然模拟区间异常取水行为时,基于孤立森林的无监督学习算法性能有所下降,但稳定性仍优于k近邻算法和箱线图法,这表明在异常数据类型检测方面基于孤立森林的无监督学习算法具有一定优越性.
Detecting Abnormal Water Extraction Data Based on Isolation Forest
In order to quickly and accurately detect the outliers of water withdrawal data of water supply enterprises,an unsupervised learning algorithm based on isolation forest was proposed.The water withdrawal data of four water sup-ply enterprises(A-D)provided by Anhui water resource intake monitoring platform was taken as an example.The data were compared with the traditional boxplot method and supervised learning k-nearest neighbor algorithm through experi-ments.The results show that the average F1 and AAUC values obtained by the unsupervised learning algorithm based iso-lation forest reach 0.963 0 and 0.998 0 respectively due to its unique tree structure,which are about 56.40% and 22.47% higher than the k-nearest neighbor algorithm,18.92% and 9.70% higher than the boxplot method,respective-ly.Although the performance of the unsupervised learning algorithm based on isolation forest was degraded when simula-ting the abnormal water intake behavior in the interval,its stability was still better than that of k-nearest neighbor algo-rithm and boxplot method,which indicates that the unsupervised learning algorithm based on isolation forest has certain advantages in the detection of abnormal data types.

anomaly detectionwater consumptionisolation forestk-nearest neighborsboxplot

徐浩、刘怀利、瞿暄

展开 >

安徽省(水利部淮河水利委员会)水利科学研究院,安徽 合肥 230088

异常值检测 取水量 孤立森林 k近邻 箱线图

安徽省自然科学联合基金项目

2208085US05

2024

水电能源科学
中国水力发电工程学会 华中科技大学 武汉国测三联水电设备有限公司

水电能源科学

CSTPCD北大核心
影响因子:0.525
ISSN:1000-7709
年,卷(期):2024.42(9)
  • 13