Wind Turbine Abnormal Data Cleaning Based on an Improved Isolation Forest Algorithm
The wind speed and power data of wind turbines are key parameters to measure the normal operation status of wind tur-bines.However,a large amount of abnormal data are contained and need to be cleared.An improved isolation forest algorithm was pro-posed.Firstly,the quartile method was used to determine the dividing line between the normal data scoring and the abnormal data scoring of the isolated forest.Secondly,the wind speed interval was divided to change the abnormality of the edge data.Finally,the improved method of least square curve fitting to remove small probability discrete and small probability stacked abnormal data was used to clean the abnormal data of wind speed and power.The results show that compared with the traditional isolated forest algorithm,the improved isola-ted forest algorithm can correctly define the dividing line between the normal data score and the abnormal data score,can remove the ac-cumulated abnormal data,and has a better cleaning effect on the discrete abnormal data at the edge of the data main band.