基于权重距离的优势边界小类样本合成算法
A synthetic algorithm of advantaged boundary for minority class samples based on weighted distance
何田中 1郑艺峰 1胡敏杰1
作者信息
- 1. 闽南师范大学数据科学与智能应用福建省高校重点实验室,福建 漳州 363000;闽南师范大学计算机学院,福建 漳州 363000
- 折叠
摘要
提出基于权重距离的优势边界小类样本合成算法(ABWD)来克服数据类别不平衡的问题.ABWD算法具有如下特点:1)定义权重距离,并基于该距离选取样本近邻;2)根据样本近邻确定该样本是否为小类的边界样本;3)对每个小类的边界样本确定其合成位置与合成数量,使该小类样本合成后近邻中小类个数不少于大类的个数,确保该小类样本具有优势边界.实验结果表明,与其他典型过抽样算法相比,算法较大提高了小类的分类性能,在G-mean、F-measure及查全率三种度量上均取得很好的实验结果.
Abstract
A synthetic algorithm of advantaged boundary for minority class samples based on weight-ed distance is presented to overcome the issue of class imbalance in data set.The ABWD algorithm has three characteristics:first,it defines a weighted distance metric and selects sample neighbors based on this distance.Second,it determines whether a sample belongs to the minority class's bound-ary based on its proximity to other samples.Finally,it calculates the positions and quantities of syn-thetic samples for each boundary sample within the minority class,ensuring that the number of mi-nority class samples is not less than that of the majority class in the neighborhood after synthesis.This guarantees an advantaged boundary for the minority class samples.Experimental results demon-strate that the proposed algorithm significantly enhances the classification performance of the minori-ty class when compared to other typical oversampling techniques.Good experimental results are ob-tained on G-mean,F-measure and recall.
关键词
数据挖掘/不平衡数据/过抽样/优势边界/权重距离Key words
data mining/imbalanced data/oversampling/advantaged boundary/weighted distance引用本文复制引用
基金项目
国家自然科学基金项目(62376114)
福建省自然科学基金项目(2021J011003)
福建省自然科学基金项目(2021J011004)
福建省自然科学基金项目(2021J011006)
出版年
2024