Input feature selection method for wind turbine fault diagnosis based on LightGBM-VIF-MIC-SFS
In order to solve the problems of high error and low classification accuracy in the fault diagnosis process of wind turbines caused by the high dimension,feature redundancy and feature correlation of wind turbine supervisory control and data acquisition(SCADA)data,a three-stage feature selection method based on LightGBM-VIF-MIC-SFS is proposed.Firstly,based on the importance calculation of all features implemented by LightGBM,a preliminary feature space is determined.Secondly,a correlation discriminant matrix is constructed based on the variance inflation factor(VIF)and maximum information coefficient(MIC)to evaluate features with similar importance in a single screening,and discard input features with high similarity.Finally,the sequential forward search method is used to process the features for the third time,input the features obtained from the previous two feature selection one by one,and retain the features that can improve the system performance,so as to achieve the final feature selection.After the establishment of the model,the real SCADA data of the wind farm is used for performance evaluation,and the proposed algorithm is compared with the two comparison algorithms on six data sets.The results show that LightGBM-VIF-MIC-SFS has significant advantages over the two comparison feature selection algorithms.A ablation experiment was conducted on the three modules within the proposed algorithm,effectively verifying the effectiveness of each module within the proposed feature selection method and the rationality and accuracy of the optimal feature space obtained based on the proposed method.
wind turbinefeature selectionLightGBMvariance inflation factormaximum information coeffiicientsequence forward search