清华大学学报自然科学版(英文版)2024,Vol.29Issue(1) :216-231.DOI:10.26599/TST.2023.9010006

Joint Sample Position Based Noise Filtering and Mean Shift Clustering for Imbalanced Classification Learning

Lilong Duan Wei Xue Jun Huang Xiao Zheng
清华大学学报自然科学版(英文版)2024,Vol.29Issue(1) :216-231.DOI:10.26599/TST.2023.9010006

Joint Sample Position Based Noise Filtering and Mean Shift Clustering for Imbalanced Classification Learning

Lilong Duan 1Wei Xue 1Jun Huang 1Xiao Zheng1
扫码查看

作者信息

  • 1. School of Computer Science and Technology,Anhui University of Technology,Maanshan 243032,China;Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230088,China
  • 折叠

Abstract

The problem of imbalanced data classification learning has received much attention.Conventional classification algorithms are susceptible to data skew to favor majority samples and ignore minority samples.Majority weighted minority oversampling technique(MWMOTE)is an effective approach to solve this problem,however,it may suffer from the shortcomings of inadequate noise filtering and synthesizing the same samples as the original minority data.To this end,we propose an improved MWMOTE method named joint sample position based noise filtering and mean shift clustering(SPMSC)to solve these problems.Firstly,in order to effectively eliminate the effect of noisy samples,SPMSC uses a new noise filtering mechanism to determine whether a minority sample is noisy or not based on its position and distribution relative to the majority sample.Note that MWMOTE may generate duplicate samples,we then employ the mean shift algorithm to cluster minority samples to reduce synthetic replicate samples.Finally,data cleaning is performed on the processed data to further eliminate class overlap.Experiments on extensive benchmark datasets demonstrate the effectiveness of SPMSC compared with other sampling methods.

Key words

imbalanced data classification/oversampling/noise filtering/clustering

引用本文复制引用

基金项目

Anhui Provincial Natural Science Foundation(2208085MF168)

Program for Synergy Innovation in the Anhui Higher Education Institutions of China(GXXT-2019-025)

Program for Synergy Innovation in the Anhui Higher Education Institutions of China(GXXT-2022-052)

出版年

2024
清华大学学报自然科学版(英文版)
清华大学

清华大学学报自然科学版(英文版)

CSTPCDEI
影响因子:0.474
ISSN:1007-0214
参考文献量46
段落导航相关论文