计算机工程与设计2024,Vol.45Issue(3) :769-776.DOI:10.16208/j.issn1000-7024.2024.03.018

基于SDL-LightGBM集成学习的软件缺陷预测模型

Software defect prediction model based on SDL-LightGBM ensemble learning

谢华祥 高建华 黄子杰
计算机工程与设计2024,Vol.45Issue(3) :769-776.DOI:10.16208/j.issn1000-7024.2024.03.018

基于SDL-LightGBM集成学习的软件缺陷预测模型

Software defect prediction model based on SDL-LightGBM ensemble learning

谢华祥 1高建华 1黄子杰2
扫码查看

作者信息

  • 1. 上海师范大学计算机科学与技术系,上海 200234
  • 2. 华东理工大学计算机科学与工程系,上海 200237
  • 折叠

摘要

为提高软件缺陷预测准确性和预测模型的可解释性,提出一种Spearman+DE+LIME+LightGBM(SDL-LightGBM)集成学习的软件缺陷预测模型.使用混合特征选择方法Spearman+LightGBM确定最佳特征子集,在保证模型预测性能的情况下降低模型复杂度;使用集成学习算法LightGBM(light gradient boosting machine)对特征子集建立预测模型,并使用差分进化(differential evolution,DE)算法优化模型的重要超参数;使用局部可解释的模型无关技术(local interpretable model-agnostic explanations,LIME)对模型进行局部可解释分析.实验通过12个项目的35个版本的结果表明,SDL-LightGBM 算法优于现有的软件缺陷预测方法,F1值平均提高8.97%,AUC值平均提高11.42%,模型训练时间缩短 43.6%.

Abstract

To improve the accuracy of software defect prediction and the interpretability of prediction model,a software defect prediction model based on Spearman+DE+LIME+LightGBM(SDL-LighTGBM)ensemble learning was proposed.Spearman+LightGBM hybrid feature selection method was used to determine the optimal feature subset,and model complexity was reduced while the prediction performance of the model was guaranteed.The ensemble learning algorithm LightGBM was used to build a prediction model for feature subsets,and differential evolution(DE)algorithm was used to optimize the important hyperparame-ters of the model.Local interpretable model-agnostic accessibility(LIME)was used for locally interpretable analysis of the mo-del.Experimental results from 35 versions of 12 projects show that the proposed method is superior to the existing software defect prediction methods.The average increase of F1 value is 8.97%,the average increase of AUC value is 11.42%,and the model training time is shortened by 43.6%.

关键词

缺陷预测/机器学习/集成学习/特征选择/模型优化/模型解释/差分进化

Key words

defect prediction/machine learning/ensemble learning/feature selection/model optimization/model interpretation/differential evolution

引用本文复制引用

基金项目

国家自然科学基金(61672355)

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
参考文献量22
段落导航相关论文