闽南师范大学学报(自然科学版)2024,Vol.37Issue(1) :54-64.DOI:10.12457/j.issn.2095-7122.2024.01.006

基于权重距离的优势边界小类样本合成算法

A synthetic algorithm of advantaged boundary for minority class samples based on weighted distance

何田中 郑艺峰 胡敏杰
闽南师范大学学报(自然科学版)2024,Vol.37Issue(1) :54-64.DOI:10.12457/j.issn.2095-7122.2024.01.006

基于权重距离的优势边界小类样本合成算法

A synthetic algorithm of advantaged boundary for minority class samples based on weighted distance

何田中 1郑艺峰 1胡敏杰1
扫码查看

作者信息

  • 1. 闽南师范大学数据科学与智能应用福建省高校重点实验室,福建 漳州 363000;闽南师范大学计算机学院,福建 漳州 363000
  • 折叠

摘要

提出基于权重距离的优势边界小类样本合成算法(ABWD)来克服数据类别不平衡的问题.ABWD算法具有如下特点:1)定义权重距离,并基于该距离选取样本近邻;2)根据样本近邻确定该样本是否为小类的边界样本;3)对每个小类的边界样本确定其合成位置与合成数量,使该小类样本合成后近邻中小类个数不少于大类的个数,确保该小类样本具有优势边界.实验结果表明,与其他典型过抽样算法相比,算法较大提高了小类的分类性能,在G-mean、F-measure及查全率三种度量上均取得很好的实验结果.

Abstract

A synthetic algorithm of advantaged boundary for minority class samples based on weight-ed distance is presented to overcome the issue of class imbalance in data set.The ABWD algorithm has three characteristics:first,it defines a weighted distance metric and selects sample neighbors based on this distance.Second,it determines whether a sample belongs to the minority class's bound-ary based on its proximity to other samples.Finally,it calculates the positions and quantities of syn-thetic samples for each boundary sample within the minority class,ensuring that the number of mi-nority class samples is not less than that of the majority class in the neighborhood after synthesis.This guarantees an advantaged boundary for the minority class samples.Experimental results demon-strate that the proposed algorithm significantly enhances the classification performance of the minori-ty class when compared to other typical oversampling techniques.Good experimental results are ob-tained on G-mean,F-measure and recall.

关键词

数据挖掘/不平衡数据/过抽样/优势边界/权重距离

Key words

data mining/imbalanced data/oversampling/advantaged boundary/weighted distance

引用本文复制引用

基金项目

国家自然科学基金项目(62376114)

福建省自然科学基金项目(2021J011003)

福建省自然科学基金项目(2021J011004)

福建省自然科学基金项目(2021J011006)

出版年

2024
闽南师范大学学报(自然科学版)
漳州师范学院

闽南师范大学学报(自然科学版)

影响因子:0.272
ISSN:1008-7826
参考文献量22
段落导航相关论文