基于数据集扩充的即时软件缺陷预测方法

扫码查看

原文链接

万方数据
维普

中文摘要：即时软件缺陷预测针对项目开发与维护过程中的代码提交来预测是否会引入缺陷。在即时软件缺陷预测研究领域，模型训练依赖于高质量的数据集，然而已有的即时软件缺陷预测方法尚未研究数据集扩充方法对即时软件缺陷预测的影响。为提高即时软件缺陷预测的性能，提出一种基于数据集扩充的即时软件缺陷预测(prediction based on data augmentation，PDA)方法。PDA方法包括特征拼接、样本生成、样本过滤和采样处理4个部分。增强后的数据集样本数量充足、样本质量高且消除了类不平衡问题。将提出的PDA方法与最新的即时软件缺陷预测方法(JIT-Fine)作对比，结果表明:在JIT-Defects4J数据集上，F1指标提升了 18。33％;在LLTC4J数据集上，F1指标仍有3。67％的提升，验证了 PDA的泛化能力。消融实验证明了所提方法的性能提升主要来源于数据集扩充和筛选机制。

外文标题：A just-in-time software defect prediction method based on data augmentation

外文摘要：Just-in-time(JIT)software defect prediction aims to predict whether code commits during project develop-ment and maintenance will introduce defects.In the field of JIT software defect prediction research,model training re-lies on high-quality datasets.However,the impact of dataset augmentation methods on JIT software defect prediction has not been thoroughly investigated in existing methods.To enhance the performance of JIT software defect predic-tion,a method based on dataset augmentation,named prediction based on data augmentation(PDA)is proposed.PDA includes four parts:feature stitching,sample generation,sample filtering,and sampling processing.The augment-ed dataset has an ample number of samples with high quality and eliminates the class imbalance problem.Comparing the proposed PDA method with the latest JIT software defect prediction method(JIT-Fine),results indicate:an 18.33％improvement in the F1 score on the JIT-Defects4J dataset;and a 3.67％improvement on the LLTC4J dataset,demon-strating PDA's generalization ability.Ablation studies have confirmed that the performance improvement of the pro-posed PDA method mainly comes from dataset augmentation and filtering mechanisms.

外文关键词：

data augmentationdeep learningjust-in-time defect predictionsample generationimbalanced datasets

作者：

杨帆、夏鸿崚

展开 >

作者单位：

江苏工程职业技术学院图文信息中心,江苏南通 226006

南通大学信息科学技术学院,江苏南通 226019

关键词：

数据增强深度学习即时软件缺陷预测样本生成类不平衡问题

基金：

南通市科技计划面上项目

项目编号：

JC2023070

出版年：

2024

DOI：

10.12194/j.ntu.20231206001

南通大学学报(自然科学版)

南通大学

南通大学学报(自然科学版)

影响因子：0.292

ISSN：1673-2340

年,卷(期)：2024.23(1)

参考文献量20