首页|基于权重距离的优势边界小类样本合成算法

基于权重距离的优势边界小类样本合成算法

A synthetic algorithm of advantaged boundary for minority class samples based on weighted distance

扫码查看
提出基于权重距离的优势边界小类样本合成算法(ABWD)来克服数据类别不平衡的问题.ABWD算法具有如下特点:1)定义权重距离,并基于该距离选取样本近邻;2)根据样本近邻确定该样本是否为小类的边界样本;3)对每个小类的边界样本确定其合成位置与合成数量,使该小类样本合成后近邻中小类个数不少于大类的个数,确保该小类样本具有优势边界.实验结果表明,与其他典型过抽样算法相比,算法较大提高了小类的分类性能,在G-mean、F-measure及查全率三种度量上均取得很好的实验结果.
A synthetic algorithm of advantaged boundary for minority class samples based on weight-ed distance is presented to overcome the issue of class imbalance in data set.The ABWD algorithm has three characteristics:first,it defines a weighted distance metric and selects sample neighbors based on this distance.Second,it determines whether a sample belongs to the minority class's bound-ary based on its proximity to other samples.Finally,it calculates the positions and quantities of syn-thetic samples for each boundary sample within the minority class,ensuring that the number of mi-nority class samples is not less than that of the majority class in the neighborhood after synthesis.This guarantees an advantaged boundary for the minority class samples.Experimental results demon-strate that the proposed algorithm significantly enhances the classification performance of the minori-ty class when compared to other typical oversampling techniques.Good experimental results are ob-tained on G-mean,F-measure and recall.

data miningimbalanced dataoversamplingadvantaged boundaryweighted distance

何田中、郑艺峰、胡敏杰

展开 >

闽南师范大学数据科学与智能应用福建省高校重点实验室,福建 漳州 363000

闽南师范大学计算机学院,福建 漳州 363000

数据挖掘 不平衡数据 过抽样 优势边界 权重距离

国家自然科学基金项目福建省自然科学基金项目福建省自然科学基金项目福建省自然科学基金项目

623761142021J0110032021J0110042021J011006

2024

闽南师范大学学报(自然科学版)
漳州师范学院

闽南师范大学学报(自然科学版)

影响因子:0.272
ISSN:1008-7826
年,卷(期):2024.37(1)
  • 22