The Software Defect Prediction Model Combining Improved Sampling Algorithm and Unsupervised Clustering
Firstly,based on adaptive comprehensive oversampling algorithm ADASYN(adaptive synthetic sam-pling),considering the connectivity among different density clusters within a small number of classes,the points that are middle neighbors distance from sampling points are included in the range of new samples,and the T-ADASYN oversampling optimization algorithm is obtained.The T-ADASYN oversampling optimization algorithm is improved to effectively increase the connectivity of clusters with different densities within a few classes and generate a more bal-anced data set.The connectivity-based Spectral Clustering algorithm is further used for the clustering prediction op-eration,thus combining the oversampling algorithm and unsupervised clustering for the first time and proposing a no-vel and practical software defect prediction model TA-SC(T-ADASYN+Spectral Clustering).Using F-Score as the evaluation indicator and Spectral Clustering as the clustering model for validation,the experimental results show that the improved T-ADASYN oversampling algorithm has an average improvement of 6%and 6%compared to common-ly used oversampling algorithms on the publicly available PROMISE dataset and NASA dataset,respectively,and the TA-SC model has the highest results of 3%and 2%improvement compared to commonly used clustering algorithms in both datasets.