首页|基于加权复杂度的SMOTE算法及其在软件缺陷预测中的应用

基于加权复杂度的SMOTE算法及其在软件缺陷预测中的应用

扫码查看
近年来,SMOTE被广泛应用于软件缺陷预测中不平衡数据的处理。然而,现有的SMOTE算法普遍忽视了不同样本的复杂度存在很大差异这一问题。事实上,在缺陷预测时样本的复杂度与其是否具有缺陷之间存在着密切的联系,因此,在进行过采样时,有必要利用样本的复杂度来辅助新样本的合成,从而提高缺陷预测的性能。如何度量样本的复杂度非常重要,论文在计算样本复杂度时充分考虑到每一个条件属性的权重,从而得到一种加权复杂度的概念。基于加权复杂度,提出一种新的SMOTE算法——WCP-SMOTE,并将其应用于软件缺陷预测。WCP-SMOTE算法首先利用粗糙集中的粒度决策熵来计算决策表中每个条件属性的重要性和权重;其次,通过对样本在所有属性上的取值进行加权求和,从而得到该样本的加权复杂度;第三,根据加权复杂度对少数类样本进行升序排序,并从头到尾对相邻的两个少数类样本求平均来不断地合成新的样本,直到获得一个平衡的数据集。在多个缺陷预测数据集上的实验表明,利用WCP-SMOTE算法来处理不平衡数据能够获得更好的软件缺陷预测性能。
SMOTE Algorithm Based on Weighted Complexity and Its Application in Software Defect Prediction
Recently,SMOTE is used to process unbalanced data in software defect prediction.However,existing SMOTE al-gorithms ignored the problem that samples complexity is different.In fact,in defect prediction,there is a relationship between sam-ples complexity and whether they have defects,therefore,when oversampling,samples complexity is used to assist new samples synthesis is necessary,to improve defect prediction performance.Measure samples complexity is important.In this paper,consider each conditional attribute weight when calculating samples complexity,obtain weighted complexity.Based on the weighted complexi-ty,propose a smote algorithm—WCP-SMOTE,applied to software defect prediction.Firstly,the importance and each condition at-tribute weight in the decision table are calculated by using the granularity decision entropy in rough set.Secondly,the weighted sam-ples complexity is obtained by weighted summation of all attributes values of the sample.Thirdly,the minority samples are sorted in ascending order according to the weighted complexity,the two adjacent minority samples are averaged from beginning to end to con-tinuously synthesize new samples until a balanced data set is obtained.Experiments on multiple defect prediction data sets show that better software defect prediction performance can be obtained by using WCP-SMOTE algorithm to handle unbalanced data.

software defect predictionunbalanced datarough setgranularity decision entropyweighted complexitySMOTE

魏威、江峰

展开 >

青岛科技大学信息科学技术学院 青岛 266061

软件缺陷预测 不平衡数据 粗糙集 粒度决策熵 加权复杂度 SMOTE

国家自然科学基金项目国家自然科学基金项目山东省自然科学基金项目

6197318061671261ZR2018MF007

2024

计算机与数字工程
中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD
影响因子:0.355
ISSN:1672-9722
年,卷(期):2024.52(5)
  • 16