首页|面向不平衡数据集的一种基于邻域的过采样算法

面向不平衡数据集的一种基于邻域的过采样算法

扫码查看
过采样是一种通过合成新的同类样本解决数据集中类分布不平衡问题的常用方法.针对数据集中样本分布不平衡的问题,提出一种基于邻域概念的PSON算法.该算法定义每个少数类样本的影响力,依据不同影响力对少数类样本进行过采样以获得平衡数据集.在50个数据集上对8种过采样算法得到的数据集进行分类测试,通过威尔科克森符号秩检验比较7种分类性能指标,结果表明采用PSON算法后分类准确率提升显著.
A Neighborhood-Based Over-Sampling Algorithm for Imbalanced Datasets
Oversampling is a commonly used method to solve the problem of imbalanced class distribution in a dataset by synthesizing new samples of the same class.A PSON algorithm based on neighborhood concept is proposed to address the issue of imbalanced sample distribu-tion in the dataset.This algorithm defines the influence of each minority class sample and oversamples the minority class samples based on dif-ferent influences to obtain a balanced dataset.Classification tests were conducted on datasets obtained from 8 oversampling algorithms on 50 datasets.The Wilcoxon symbol rank test was used to compare 7 classification performance indicators,and the results showed that the use of PSON algorithm significantly improved classification accuracy.

imbalanced datasetover-samplingclassificationreverse neighbors

孟国庆、高源、梅颖、卢诚波

展开 >

浙江理工大学 计算机科学与技术学院,浙江 杭州 310018

国网浙江省电力有限公司丽水供电公司,浙江 丽水 323050

丽水学院 数学与计算机学院,浙江 丽水 323000

浙江得图网络有限公司,浙江 丽水 310011

展开 >

不平衡数据集 过采样 分类 逆近邻

2024

软件导刊
湖北省信息学会

软件导刊

影响因子:0.524
ISSN:1672-7800
年,卷(期):2024.23(9)