首页|基于安全欠采样的不均衡多标签数据集成学习方法

基于安全欠采样的不均衡多标签数据集成学习方法

扫码查看
多标签分类任务广泛存在于现实生活中,然而其经常存在不均衡数据问题,严重影响了分类性能.目前解决该问题的主流技术为重采样方法,主要分为过采样和欠采样,过采样通过生成与少数类标签相关的样本,欠采样则是通过删除与多数类标签相关的样本.然而,这些方法都专注于解决一种不均衡问题,即标签内不均衡或标签间不均衡,导致在解决一种不均衡的同时可能引入另一种不均衡.针对该问题,本文提出一种基于安全欠采样的不均衡多标签数据集成学习方法ESUS(Ensemble learning method based on Safe Under-Sampling).首先通过标签划分将多标签不均衡数据集划分成单标签数据集和标签对数据集,针对单标签数据集,提出一种安全欠采样方法解决标签内不均衡问题,并利用采样后的均衡数据集构建二分类模型.对于标签对数据集,进行数据剪枝后利用集成学习解决标签间不均衡问题,在保持分类性能的同时降低时空复杂度.最后将单标签数据集模型和标签对数据集模型集成为最终的分类模型.在六个多标签不均衡数据集上的实验结果表明:和七种对比方法相比,ESUS方法在四个评价指标上更稳定有效.
An Imbalanced Multi-Label Data Ensemble Learning Method Based on Safe Under-Sampling
The task of multi-label classification is widely present in real life,but there is often an issue of imbalanced data,which seriously affects the classification performance.At present,the mainstream technology for solving this problem is resampling,which are mainly divided into over-sampling and under-sampling.Particularly,over-sampling generates sam-ples related to minority class labels while under-sampling removes samples related to majority class labels.However,these methods all focus on solving an imbalance problem,namely intra label imbalance or inter label imbalance,which may intro-duce another imbalance problem while solving one imbalance problem.In response to this issue,this paper proposes an im-balanced multi-label data ensemble learning method ESUS(Ensemble learning method based on Safe Under-Sampling)based on safe under-sampling.Firstly,the imbalanced multi-label dataset is divided into single label datasets and label pair datasets through label partitioning.For single label datasets,this paper proposes a secure under-sampling method to solve the problem of intra label imbalance,and constructs binary classification models using the sampled balanced dataset.For label pair datasets,ensemble learning is used on the pruned data to solve the problem of inter label imbalance,which may maintain the classification performance of the model and reduce spatiotemporal complexity.Finally,the single label dataset models and label pair dataset models are integrated into the final classification model.The experimental results on six imbalanced multi-label datasets show that compared with seven comparison methods,the ESUS method is more stable and effective on four evaluation metrics.

multi-label classificationimbalanced datalabel partitioningsafe under-samplingdata pruningensem-ble learning

孙中彬、刁宇轩、马苏洋

展开 >

中国矿业大学计算机科学与技术学院,江苏 徐州 221116

矿山数字化教育部工程研究中心,江苏 徐州 221116

多标签分类 不均衡数据 标签划分 安全欠采样 数据剪枝 集成学习

中央高校基本科研业务费专项资金资助

2021QN1075

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(10)