A Neighborhood-Based Over-Sampling Algorithm for Imbalanced Datasets
Oversampling is a commonly used method to solve the problem of imbalanced class distribution in a dataset by synthesizing new samples of the same class.A PSON algorithm based on neighborhood concept is proposed to address the issue of imbalanced sample distribu-tion in the dataset.This algorithm defines the influence of each minority class sample and oversamples the minority class samples based on dif-ferent influences to obtain a balanced dataset.Classification tests were conducted on datasets obtained from 8 oversampling algorithms on 50 datasets.The Wilcoxon symbol rank test was used to compare 7 classification performance indicators,and the results showed that the use of PSON algorithm significantly improved classification accuracy.