首页|不平衡数据集的DC-SMOTE过采样方法

不平衡数据集的DC-SMOTE过采样方法

扫码查看
针对不平衡数据集在分类任务中表现不佳的问题,提出基于局部密度与集中度的过采样算法.针对数据集中所有的少数类样本点,分别利用高斯核函数与局部引力来计算局部密度与集中度;对于局部密度较小的部分有针对性地合成第一类新样本,解决类内不平衡问题.根据集中度的不同,区分出少数类样本的边界,有针对性地合成第二类新样本,达到强化边界的作用;同时,通过自适应生成新样本,有效解决大部分过采样算法没有明确过采样量或者盲目追求样本平衡度相等的问题.最后,在公开的 12 个不平衡数据集上进行了实验,实验结果表明,本算法在低不平衡数据集与高不平衡数据集上的应用均拥有良好的表现.
DC-SMOTE oversampling method for an imbalanced dataset
Inspired by the poor performance of imbalanced datasets in classification tasks,an oversampling algorithm based on local density and centrality is proposed.First,for all the minority sample points in the dataset,the Gaussian kernel function and local gravity are used to calculate the local density and centrality,respectively.Furthermore,the first type of new samples is synthesized for the portion with small local density to solve the imbalance problem within this kind.According to the difference of centrality,the boundaries of minority samples are distinguished,and the second kind of samples are specifically synthesized to strengthen the boundaries.Meanwhile,new samples are generated adapt-ively,which solves the problem that most oversampling algorithms fail to clearly define the oversampling quantity or blindly pursue the balance of the number of samples of two categories.Finally,experiments are conducted on 12 public imbalanced datasets and results reveal that the algorithm has good performance in low-and high-imbalanced datasets.

imbalanced datasetoversamplingGaussian kernellocal gravityhigh-imbalanced dataSMOTEimbal-ance ratioclassification

冀常鹏、尚佳奇、代巍

展开 >

辽宁工程技术大学 电子与信息工程学院,辽宁 葫芦岛 125105

辽宁工程技术大学 研究生院,辽宁 葫芦岛 125105

不平衡数据集 过采样 高斯核函数 局部引力 高不平衡数据 合成少数类过采样 不平衡度 分类

2024

智能系统学报
中国人工智能学会 哈尔滨工程大学

智能系统学报

CSTPCD北大核心
影响因子:0.672
ISSN:1673-4785
年,卷(期):2024.19(3)
  • 3