多标签分类任务广泛存在于现实生活中,然而其经常存在不均衡数据问题,严重影响了分类性能.目前解决该问题的主流技术为重采样方法,主要分为过采样和欠采样,过采样通过生成与少数类标签相关的样本,欠采样则是通过删除与多数类标签相关的样本.然而,这些方法都专注于解决一种不均衡问题,即标签内不均衡或标签间不均衡,导致在解决一种不均衡的同时可能引入另一种不均衡.针对该问题,本文提出一种基于安全欠采样的不均衡多标签数据集成学习方法ESUS(Ensemble learning method based on Safe Under-Sampling).首先通过标签划分将多标签不均衡数据集划分成单标签数据集和标签对数据集,针对单标签数据集,提出一种安全欠采样方法解决标签内不均衡问题,并利用采样后的均衡数据集构建二分类模型.对于标签对数据集,进行数据剪枝后利用集成学习解决标签间不均衡问题,在保持分类性能的同时降低时空复杂度.最后将单标签数据集模型和标签对数据集模型集成为最终的分类模型.在六个多标签不均衡数据集上的实验结果表明:和七种对比方法相比,ESUS方法在四个评价指标上更稳定有效.
An Imbalanced Multi-Label Data Ensemble Learning Method Based on Safe Under-Sampling
The task of multi-label classification is widely present in real life,but there is often an issue of imbalanced data,which seriously affects the classification performance.At present,the mainstream technology for solving this problem is resampling,which are mainly divided into over-sampling and under-sampling.Particularly,over-sampling generates sam-ples related to minority class labels while under-sampling removes samples related to majority class labels.However,these methods all focus on solving an imbalance problem,namely intra label imbalance or inter label imbalance,which may intro-duce another imbalance problem while solving one imbalance problem.In response to this issue,this paper proposes an im-balanced multi-label data ensemble learning method ESUS(Ensemble learning method based on Safe Under-Sampling)based on safe under-sampling.Firstly,the imbalanced multi-label dataset is divided into single label datasets and label pair datasets through label partitioning.For single label datasets,this paper proposes a secure under-sampling method to solve the problem of intra label imbalance,and constructs binary classification models using the sampled balanced dataset.For label pair datasets,ensemble learning is used on the pruned data to solve the problem of inter label imbalance,which may maintain the classification performance of the model and reduce spatiotemporal complexity.Finally,the single label dataset models and label pair dataset models are integrated into the final classification model.The experimental results on six imbalanced multi-label datasets show that compared with seven comparison methods,the ESUS method is more stable and effective on four evaluation metrics.