首页|基于多源域适应和数据增强的跨项目开源软件缺陷预测

基于多源域适应和数据增强的跨项目开源软件缺陷预测

扫码查看
通过挖掘软件代码仓库数据预测软件缺陷是提高软件质量和增强软件安全性的重要方法.人们提出了多种基于机器学习的方法挖掘软件代码仓缺陷数据预测软件缺陷.然而,由于从不同代码仓提取的软件缺陷数据具有异质性,因此机器学习的预测效果往往并不理想.为此,本文提出一种基于多源域适应和数据增强的缺陷预测方法.该方法通过挖掘各种源代码仓和目标代码仓之间的特征相似性提高预测的准确性:一方面利用带权重的最大平均方差使特征分布距离最小,另一方面利用注意力机制提高与目标代码仓高度相似的源代码仓权重.对比实验结果表明,本文所提方法在软件缺陷预测效果最佳.
Cross-project Open Source Software Defect Prediction Based on Multi-Source Domain Adaptation and Data Augmentation
Predicting defect through mining softwarerepositories(MSRs)is crucial for enhancing the security andquality of software.With an extensive collection of software defectdata acquired by mining various repositories,numerous machinelearning-based approaches have been proposed for defectdetection.However,due to the heterogeneity of vulnerabilitydata originating from different repositories,the robustness ofthe approach is significantly compromised.In light of this,a defect prediction approach was proposed,based onmulti-source Domain Adaptation and Data Augmentation(DPDA).Our approach mined feature similarities be-tween various source repositories and targetrepository.Specifically,it employed weighted maximum meandifferences to minimize the distribution distance of their features.Meanwhile,different attention scores were assigned to weighdifferent sources to increase the weight of source repositories withhigh similarity to the target repository.This strategic weightingaims to focus on the source re-pository with highsimilarity in the model,reducing the impact of irrelevant repositories.Thecomparative experiments demonstrated that our approach can achievethe best performance in predicting software defect.

defect predictionmulti-source domain adaptationattention mechanismdata augmentation

李光杰、唐艺、何焱、张启磊、邢颖、赵梦赐

展开 >

军事科学院国防科技创新研究院,北京 100071

北京邮电大学,北京 100876

缺陷预测 多源域适应 注意力机制 数据增强

2024

智能安全
军事科学院国防科技创新研究院

智能安全

ISSN:2097-2075
年,卷(期):2024.3(1)
  • 34