软件导刊2024,Vol.23Issue(9) :116-121.DOI:10.11907/rjdk.232015

面向不平衡数据集的一种基于邻域的过采样算法

A Neighborhood-Based Over-Sampling Algorithm for Imbalanced Datasets

孟国庆 高源 梅颖 卢诚波
软件导刊2024,Vol.23Issue(9) :116-121.DOI:10.11907/rjdk.232015

面向不平衡数据集的一种基于邻域的过采样算法

A Neighborhood-Based Over-Sampling Algorithm for Imbalanced Datasets

孟国庆 1高源 2梅颖 3卢诚波3
扫码查看

作者信息

  • 1. 浙江理工大学 计算机科学与技术学院,浙江 杭州 310018
  • 2. 国网浙江省电力有限公司丽水供电公司,浙江 丽水 323050
  • 3. 丽水学院 数学与计算机学院,浙江 丽水 323000;浙江得图网络有限公司,浙江 丽水 310011
  • 折叠

摘要

过采样是一种通过合成新的同类样本解决数据集中类分布不平衡问题的常用方法.针对数据集中样本分布不平衡的问题,提出一种基于邻域概念的PSON算法.该算法定义每个少数类样本的影响力,依据不同影响力对少数类样本进行过采样以获得平衡数据集.在50个数据集上对8种过采样算法得到的数据集进行分类测试,通过威尔科克森符号秩检验比较7种分类性能指标,结果表明采用PSON算法后分类准确率提升显著.

Abstract

Oversampling is a commonly used method to solve the problem of imbalanced class distribution in a dataset by synthesizing new samples of the same class.A PSON algorithm based on neighborhood concept is proposed to address the issue of imbalanced sample distribu-tion in the dataset.This algorithm defines the influence of each minority class sample and oversamples the minority class samples based on dif-ferent influences to obtain a balanced dataset.Classification tests were conducted on datasets obtained from 8 oversampling algorithms on 50 datasets.The Wilcoxon symbol rank test was used to compare 7 classification performance indicators,and the results showed that the use of PSON algorithm significantly improved classification accuracy.

关键词

不平衡数据集/过采样/分类/逆近邻

Key words

imbalanced dataset/over-sampling/classification/reverse neighbors

引用本文复制引用

出版年

2024
软件导刊
湖北省信息学会

软件导刊

影响因子:0.524
ISSN:1672-7800
段落导航相关论文