基于自适应低秩表示的多任务AUC优化算法

扫码查看

原文链接

万方数据
维普

中文摘要：多任务学习是一种基于相似任务之间的关联性进行学习迁移,使得模型在数据不足场景下仍能表现出良好泛化性能的学习方法.在该领域内,大多数现有以准确率作为基准评价标准的方法只适用于平衡分布场景.然而,诸多实际应用如疾病检测、垃圾邮件检测等,均涉及样本分布不平衡问题.针对多任务学习面向任务相关性的高要求,即当模型学习和共享不相关知识时,负迁移可能会影响模型朝着错误方向训练.因此,大多数现有方法在此类场景中无法得到有效应用.为解决该实际问题,设计一种能适用于样本不平衡场景的多任务学习算法变得尤为重要.本文提出了一种基于自适应低秩表示的多任务AUC优化算法,首先引入了对标签分布不敏感的ROC曲线下面积(AUC)作为该学习任务的评价指标,并建立了一种用于AUC优化的多任务学习算法,以提高模型在样本不平衡场景下的性能表现.同时,为进一步有效优化模型,本文将原始成对优化问题重构为逐样本极大极小优化问题,使得每一轮迭代复杂度由O(Lni,+ni,-)降低至O(L(ni,++ni,-)).针对多任务学习中存在的负迁移现象,本文引入了一种自适应低秩正则项,以消除模型冗余信息,同时提高模型的泛化性能.最后,通过与多个对比方法在四个仿真数据集和三个真实数据集Landmine、MHC-I和USPS上的比较,所有实验结果一致证明了本文所提出算法的有效性.

外文标题：An Adaptive Low-Rank Algorithm for Multi-Task AUC Learning

外文摘要：In recent years,benefiting from the excellent performance and work efficiency of deep neural networks(DNNs),machine learning technology has achieved great success in various fields,such as natural language processing,computer vision,medical named entity recognition,and medical image analysis.In this field,multi-task learning(MTL)is based on the correlation between similar tasks for learning transfer,enabling the model to still exhibit good generalization performance in scenarios with insufficient data.In the past decade,most existing methods are proposed for the balanced category distribution,and use accuracy-based metrics as the benchmark evaluations.However,many practical applications,such as disease detection and spam,suffer from imbalanced sample distributions,which causes the performance degradation of DNNs.Furthermore,multi-task learning has high requirements for task relevance and is apt to the negative transfer phenomenon.Specifically,when models learn to share knowledge among tasks,the irrelevant knowledge may mislead the model training in the wrong direction.This process would result in an unexpected dilemma while most existing methods cannot be effectively applied in such scenarios.Hence,to address this learning problem,designing a multi-task learning algorithm that can learn in imbalanced sample scenarios with low-correlation tasks is of paramount importance to practical applications,as well as represents a critical machine learning challenge.This paper proposes a multi-task AUC optimization method based on an adaptive low-rank Factor Nuclear Norm minus Frobenuis Norm(FNNFN)regularizer to achieve robustness on imbalanced and irrelevant data,doomed MTAUC-FNNFN.Firstly,the area under the ROC curve(AUC),which is usually adopted as the measure for imbalanced distribution,is introduced for directly reflecting the model performance among tasks.Considering the discontinuity and non-differentiability of the loss function for AUC,this work establishes a novel multi-task learning algorithm for AUC optimization,which greatly improves the AUC value in imbalanced scenarios.Meanwhile,in order to effectively optimize,this method reconstructs the original pairwise AUC formulation into an instance-wise minimax optimization problem,reducing the complexity of per-iteration from O(Lni,+ni,-)to O(L(ni,++ni,-)).On top of this well-formed optimization objective,the factor parameters could be easily updated with the gradient descent ascent method.For resisting the negative effect of irrelevant tasks,this paper further introduces an adaptive low-rank regulari-zation term FNNFN to eliminate negative transfer phenomena in multi-task learning and improve the generalization performance of the model.Specifically,penalizing the small singular values empirically equates to dropping the trivial data.In this case,this low-rank structure remains the relevant information within the matrix parameters for sharing knowledge.For the purpose of achieving a comprehensive assessment,we make a comparison between the proposed method and other methods including multi-task learning methods,AUC optimization methods,and low-rank representation methods.For fairness,we uniformly apply them to multi-task datasets and estimate their performance with the AUC metric.A series of experimental results of our method on four simulated datasets and three actual datasets,Landmine,MHC-I,and USPS,consistently demonstrate the effectiveness of our proposed algorithm.

外文关键词：

multi-task learningAUC optimizationlow-rank representation

作者：

孙宇辰、许倩倩、王子泰、杨智勇、黄庆明

展开 >

作者单位：

中国科学院计算技术研究所智能信息处理重点实验室北京 100190

中国科学院大学计算机科学与技术学院北京 101408

中国科学院信息工程研究所信息安全国家重点实验室北京 100093

中国科学院大学网络空间安全学院北京 100049

中国科学院大学大数据挖掘与知识管理重点实验室北京 101408

鹏城实验室广东深圳 518055

展开 >

关键词：

多任务学习 AUC优化低秩表示

基金：

科技创新2030-"新一代人工智能"重大项目国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目中央高校基本科研业务费专项基金中国科学院青年促进会会员项目中国科学院战略性先导科技专项中国科学院计算技术研究所创新基金

项目编号：

2018AAA010200062236008U21B2038U23B2051619310086212207561976202XDB0680000E000000

出版年：

2024

DOI：

10.11897/SP.J.1016.2024.02678

计算机学报

中国计算机学会中国科学院计算技术研究所

计算机学报

CSTPCD北大核心

影响因子：3.18

ISSN：0254-4164

年,卷(期)：2024.47(11)