Research on imbalanced data classification algorithm in software defect prediction
A multi-level Adaptive Judgment Synthesis Random oversampling(AJCC Ram)combined with XGBoost ensemble learning for imbalanced data classification(XG-AJCC)is proposed to address the issue of low accuracy in software defect prediction caused by imbalanced data.This model uses AJCC-Ram meth-od and XGBoost method to pre-process and classify imbalanced data,respectively,in order to achieve accu-rate prediction of software defects.The experiment results show that compared to the AJCC-Ram sampling method and the adjusted XGBoost method,the XG-AJCC prediction model has increased the F1 mean in the AEEEM and NASA datasets by about 10%and 6%,respectively.Compared to other prediction models,the F1 mean of this model is significantly higher in both datasets.This indicates that this model has high classification performance and prediction stability,and can accurately predict software defects in imbalanced data.
AJCC-RamXGBoost ensemble learningoversamplingimbalanced data classificationsoft-ware defect prediction