SMOTE Algorithm Based on Weighted Complexity and Its Application in Software Defect Prediction
Recently,SMOTE is used to process unbalanced data in software defect prediction.However,existing SMOTE al-gorithms ignored the problem that samples complexity is different.In fact,in defect prediction,there is a relationship between sam-ples complexity and whether they have defects,therefore,when oversampling,samples complexity is used to assist new samples synthesis is necessary,to improve defect prediction performance.Measure samples complexity is important.In this paper,consider each conditional attribute weight when calculating samples complexity,obtain weighted complexity.Based on the weighted complexi-ty,propose a smote algorithm—WCP-SMOTE,applied to software defect prediction.Firstly,the importance and each condition at-tribute weight in the decision table are calculated by using the granularity decision entropy in rough set.Secondly,the weighted sam-ples complexity is obtained by weighted summation of all attributes values of the sample.Thirdly,the minority samples are sorted in ascending order according to the weighted complexity,the two adjacent minority samples are averaged from beginning to end to con-tinuously synthesize new samples until a balanced data set is obtained.Experiments on multiple defect prediction data sets show that better software defect prediction performance can be obtained by using WCP-SMOTE algorithm to handle unbalanced data.