山西大学学报(自然科学版)2024,Vol.47Issue(1) :59-68.DOI:10.13451/j.sxu.ns.2023141

基于标记相关性的多标记专属特征学习算法

Multi-label Specific Features Learning Algorithm Based on Label Correlation

李华 王志杰
山西大学学报(自然科学版)2024,Vol.47Issue(1) :59-68.DOI:10.13451/j.sxu.ns.2023141

基于标记相关性的多标记专属特征学习算法

Multi-label Specific Features Learning Algorithm Based on Label Correlation

李华 1王志杰1
扫码查看

作者信息

  • 1. 石家庄铁道大学 数理系, 河北 石家庄 050043
  • 折叠

摘要

基于双标记专属特征的多标记分类算法(BILAS)是一种代表性的多标记学习算法,然而其只考虑了在标记对下取值不同的样本,忽略了取值相同的样本,使得生成的专属特征不能全面准确地刻画标记信息.针对这一不足,基于标记的二阶相关性,对标记对的全部类型样本生成专属特征,提出基于标记相关性的多标记专属特征学习算法.首先,利用基于距离的原型学习方法选择所有标记对的原型,并进一步生成相应的专属特征;然后利用标记幂集的思想构造多标记分类器.在来自MULAN(a Java library for multi-label learning)的5个公开测试数据集上进行实验,与BILAS算法和基于校准标记排序的多标记分类算法(CLR)相比,所提算法在5种多标记评价指标上综合平均排名均是第一,且分别比BILAS和CLR算法提高了20.4%和37.1%,表明了所提算法具有较好的性能.

Abstract

Bilabel-specific features for multi-label classification algorithm(BILAS)is a representative multi-label learning algorithm.However,it only considers samples with different values for the label pair,and ignores samples with the same value,so that the gen-erated label-specific features could not comprehensively and accurately characterize the label information.To weaken this shortcom-ing,based on the second-order correlation of labels,label-specific features are generated for all types of samples of the label pair,and a multi-label specific features learning algorithm based on label correlation is proposed.Firstly,the distance-based prototype learning method is used to select prototypes of all label pairs,and then the corresponding label-specific features are generated;fur-thermore,using the idea of label powerset,a multi-label classifier is constructed.Experimental results on five publicly available test datasets from MULAN(a Java library for multi-label learning)show that the proposed algorithm,compared to BILAS and multila-bel classification algorithm via calibrated label ranking(CLR),ranks first in terms of the comprehensive average ranking on the five multi-label evaluation metrics.Furthermore,it achieves improvements of 20.4%and 37.1%compared to BILAS and CLR,respec-tively,demonstrating the effectiveness of the proposed algorithm.

关键词

多标记学习/数据降维/相似度/原型学习/标记幂集

Key words

multi-label learning/dimensionality reduction/similarity/prototype learning/label powerset

引用本文复制引用

基金项目

国家自然科学基金(61806133)

出版年

2024
山西大学学报(自然科学版)
山西大学

山西大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.287
ISSN:0253-2395
参考文献量31
段落导航相关论文