首页|Joint Sample Position Based Noise Filtering and Mean Shift Clustering for Imbalanced Classification Learning

Joint Sample Position Based Noise Filtering and Mean Shift Clustering for Imbalanced Classification Learning

扫码查看
The problem of imbalanced data classification learning has received much attention.Conventional classification algorithms are susceptible to data skew to favor majority samples and ignore minority samples.Majority weighted minority oversampling technique(MWMOTE)is an effective approach to solve this problem,however,it may suffer from the shortcomings of inadequate noise filtering and synthesizing the same samples as the original minority data.To this end,we propose an improved MWMOTE method named joint sample position based noise filtering and mean shift clustering(SPMSC)to solve these problems.Firstly,in order to effectively eliminate the effect of noisy samples,SPMSC uses a new noise filtering mechanism to determine whether a minority sample is noisy or not based on its position and distribution relative to the majority sample.Note that MWMOTE may generate duplicate samples,we then employ the mean shift algorithm to cluster minority samples to reduce synthetic replicate samples.Finally,data cleaning is performed on the processed data to further eliminate class overlap.Experiments on extensive benchmark datasets demonstrate the effectiveness of SPMSC compared with other sampling methods.

imbalanced data classificationoversamplingnoise filteringclustering

Lilong Duan、Wei Xue、Jun Huang、Xiao Zheng

展开 >

School of Computer Science and Technology,Anhui University of Technology,Maanshan 243032,China

Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230088,China

Anhui Provincial Natural Science FoundationProgram for Synergy Innovation in the Anhui Higher Education Institutions of ChinaProgram for Synergy Innovation in the Anhui Higher Education Institutions of China

2208085MF168GXXT-2019-025GXXT-2022-052

2024

清华大学学报自然科学版(英文版)
清华大学

清华大学学报自然科学版(英文版)

CSTPCDEI
影响因子:0.474
ISSN:1007-0214
年,卷(期):2024.29(1)
  • 46