Multi-label sampling based on local label imbalance

扫码查看

原文链接

NSTL
Elsevier

外文摘要：Class imbalance is an inherent characteristic of multi-label data that hinders most multi-label learning methods. One efficient and flexible strategy to deal with this problem is to employ sampling techniques before training a multi-label learning model. Although existing multi-label sampling approaches alleviate the global imbalance of multi-label datasets, it is actually the imbalance level within the local neighbour-hood of minority class examples that plays a key role in performance degradation. To address this issue, we propose a novel measure to assess the local label imbalance of multi-label datasets, as well as two multi-label sampling approaches, namely Multi-Label Synthetic Oversampling based on Local label imbal-ance (MLSOL) and Multi-Label Undersampling based on Local label imbalance (MLUL). By considering all informative labels, MLSOL creates more diverse and better labeled synthetic instances for difficult exam-ples, while MLUL eliminates instances that are harmful to their local region. Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed measure and sampling approaches for a variety of evaluation metrics, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data. (c) 2021 Elsevier Ltd. All rights reserved.

外文关键词：

Multi-label learningClass imbalanceOversampling and undersamplingLocal label imbalanceEnsemble methodsCLASSIFICATION

作者：

Liu, Bin、Blekas, Konstantinos、Tsoumakas, Grigorios

展开 >

作者单位：

Aristotle Univ Thessaloniki

Univ Ioannina

出版年：

2022

DOI：

10.1016/j.patcog.2021.108294

Pattern Recognition

EISCI

ISSN：0031-3203

年,卷(期)：2022.122

被引量9
参考文献量38