首页|SMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors

SMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors

扫码查看
In recent years, class imbalance learning (CIL) has become an important branch of machine learning. The Synthetic Minority Oversampling TEchnique (SMOTE) is considered to be a benchmark algorithm among CIL techniques. Although the SMOTE algorithm performs well on the vast majority of class-imbalance tasks, it also has the inherent drawback of noise propagation. Many SMOTE-variants have been proposed to address this problem. Generally, the improved solutions conduct a hybrid sampling procedure, i.e., carrying out an undersampling process after SMOTE to remove noises. However, owing to the complexity of data distribution, it is sometimes difficult to accurately identify real instances of noise, resulting in low modeling quality. In this paper, we propose a more robust and universal SMOTE hybrid variant algorithm named SMOTE-reverse k-nearest neighbors (SMOTE-RkNN). The proposed algorithm identifies noise based on probability density but not local neighborhood information. Specifically, the probability density information of each instance is provided by RkNN, a well-known KNN variant. Noisy instances are found and deleted according to their relevant probability density. In experiments on 46 classimbalanced data sets, SMOTE-RkNN showed promising results in comparison with several popular SMOTE hybrid variant algorithms. (C) 2022 Elsevier Inc. All rights reserved.

Class imbalance learningSMOTEHybrid samplingReverse k-nearest neighborsProbability density estimationNoise filteringIMBALANCED DATASETSOVERSAMPLING TECHNIQUECLASSIFICATIONENSEMBLES

Zhang, Aimin、Yu, Hualong、Huan, Zhangjun、Yang, Xibei、Zheng, Shang、Gao, Shang

展开 >

Jiangsu Univ Sci & Technol

2022

Information Sciences

Information Sciences

EISCI
ISSN:0020-0255
年,卷(期):2022.595
  • 19
  • 50