Cross-project Open Source Software Defect Prediction Based on Multi-Source Domain Adaptation and Data Augmentation
Predicting defect through mining softwarerepositories(MSRs)is crucial for enhancing the security andquality of software.With an extensive collection of software defectdata acquired by mining various repositories,numerous machinelearning-based approaches have been proposed for defectdetection.However,due to the heterogeneity of vulnerabilitydata originating from different repositories,the robustness ofthe approach is significantly compromised.In light of this,a defect prediction approach was proposed,based onmulti-source Domain Adaptation and Data Augmentation(DPDA).Our approach mined feature similarities be-tween various source repositories and targetrepository.Specifically,it employed weighted maximum meandifferences to minimize the distribution distance of their features.Meanwhile,different attention scores were assigned to weighdifferent sources to increase the weight of source repositories withhigh similarity to the target repository.This strategic weightingaims to focus on the source re-pository with highsimilarity in the model,reducing the impact of irrelevant repositories.Thecomparative experiments demonstrated that our approach can achievethe best performance in predicting software defect.