首页|软件缺陷预测中不平衡数据分类算法研究

软件缺陷预测中不平衡数据分类算法研究

扫码查看
针对不平衡数据导致软件缺陷预测准确率低的问题,文中提出一种多层次自适应判断合成随机过采样(AJCC-Ram)结合XGBoost集成学习的不平衡数据分类方法(简称XG-AJCC)。该模型采用AJCC-Ram方法和XGBoost方法分别进行不平衡数据预处理和数据分类,从而实现软件缺陷准确预测。实验结果表明,相较于AJCC-Ram采样方法和调参后的XGBoost方法,XG-AJCC预测模型在AEEEM和NASA数据集中的F1均值分别提升了 10%和6%左右。对比其他预测模型,该模型在两个数据集中的F1均值明显更高。由此说明,该模型具备较高的分类性能和预测稳定性,能够实现不平衡数据软件缺陷的准确预测。
Research on imbalanced data classification algorithm in software defect prediction
A multi-level Adaptive Judgment Synthesis Random oversampling(AJCC Ram)combined with XGBoost ensemble learning for imbalanced data classification(XG-AJCC)is proposed to address the issue of low accuracy in software defect prediction caused by imbalanced data.This model uses AJCC-Ram meth-od and XGBoost method to pre-process and classify imbalanced data,respectively,in order to achieve accu-rate prediction of software defects.The experiment results show that compared to the AJCC-Ram sampling method and the adjusted XGBoost method,the XG-AJCC prediction model has increased the F1 mean in the AEEEM and NASA datasets by about 10%and 6%,respectively.Compared to other prediction models,the F1 mean of this model is significantly higher in both datasets.This indicates that this model has high classification performance and prediction stability,and can accurately predict software defects in imbalanced data.

AJCC-RamXGBoost ensemble learningoversamplingimbalanced data classificationsoft-ware defect prediction

张健、姜虹

展开 >

汉中职业技术学院汽车与机电工程学院,陕西汉中 723002

西安工业大学计算机科学与工程学院,西安 710021

AJCC-Ram XGBoost集成学习 过采样 不平衡数据分类 软件缺陷预测

2024

信息技术
黑龙江省信息技术学会 中国电子信息产业发展研究院 中国信息产业部电子信息中心

信息技术

CSTPCD
影响因子:0.413
ISSN:1009-2552
年,卷(期):2024.(12)