弱监督场景下的支持向量机算法综述

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：支持向量机(Support Vector Machine,SVM)是一种建立在结构风险最小化原则上的统计学习方法,以其在非线性、小样本以及高维问题中的独特优势被广泛应用于图像识别、故障诊断以及文本分类等领域.但SVM是一种监督学习算法,它旨在利用大量的、唯一且明确的真值标记样本来训练学习器,在不完全监督、不确切监督以及多义监督等弱监督场景下难以取得较好的效果.本文首先阐述了弱监督场景的概念和SVM的相关理论,然后从弱监督场景角度出发,系统地梳理了目前SVM算法的研究现状和发展,包括基于半监督学习、多示例学习以及多标记学习的方法;其中基于半监督学习的方法根据数据假设可细分为基于聚类假设和基于流形假设的方法,基于多标记学习的方法根据解决方案可细分为基于示例水平空间、基于包水平空间以及基于嵌入空间的方法,基于多标记学习的方法根据处理思路可细分为基于问题转换和基于算法自适应的方法;随后,本文总结了部分代表性算法在公开数据集上的实验结果;最后,探讨并展望了未来可能的研究方向.

外文标题：Survey on Support Vector Machine Algorithms in Weakly Supervised Scenarios

外文摘要：Support Vector Machine(SVM)is a statistical learning method based on the principle of minimizing structural risk.It provides an intuitive geometric interpretation and rigorous math-ematical derivation,showing the unique advantages in handling nonlinear,few shot,and high dimensional problems.SVM has garnered significant attention and widely applied in various fields such as image recognition,fault diagnosis,and text classification.SVM is a classical supervised machine learning algorithm designed to train the learner using samples with complete,unique,and unambiguous ground-truth labels to ensure the generalization ability.However,as real-world application tasks become increasingly complex,creating such a sample set is laborious and difficult.On the one hand,it requires a significant amount of time and cost for data collection,cleaning,and debugging.For specific domains,especially in the medical field,experts often need to combine domain knowledge to process and label the samples.On the other hand,learning tasks in the real world often undergo changes and evolution.For example,data annotation criteria,annota-tion granularity,or downstream use cases may frequently change,requiring the re-labeling of sam-ples.Consequently,a large amount of samples in real-world applications lack complete and unambig-uous labels for the high cost of sample labeling.Moreover,samples in most practical task scenari-os may exhibit polysemous,that is,a sample can be associated with multiple labels at the same time.Therefore,standard SVM struggles to achieve satisfactory performance in weakly supervised scenarios such as incomplete supervision,inexact supervision,and polysemous supervision.Weakly supervised scenarios are contrasted with supervised scenarios.Unlike the latter,learning algorithms in weakly supervised scenarios are designed to train the learner using samples that may be limited,ambiguous,or only roughly labeled.From the perspective of weakly supervised sce-narios,this survey systematically reviews the current research status and development of SVM algorithms.Firstly,the concept of weakly supervised scenarios and the basic mathematical prin-ciple of SVM are briefly introduced.Secondly,the existing SVM algorithms in weakly supervised scenarios are divided into three categories according to different learning paradigms,namely,the semi-supervised learning based methods,the multiple instance learning based methods,and the multi-label learning based methods.Specifically,the semi-supervised learning based methods can be further subdivided into clustering assumption based approaches and manifold assumption based approaches according to data assumptions.The multiple instance learning based methods can be further classified into instance level based approaches,bag level based approaches and embedded space based approaches according to problem solutions.The multi-label learning based methods can be further refined into problem transformation based approaches and algorithm adaptation based approaches according to processing ideas.This survey provides a detailed introduction to the repre-sentative methods within these categories,summarizes and analyzes their characteristics and short-comings,offering a basis for selecting different SVM methods in various task scenarios.After that,the performance of some representative algorithms is evaluated and analyzed by carefully conducting experiments on publicly available datasets.Finally,potential research directions for the future development of SVM algorithms in weakly supervised scenarios are discussed,such as data imbalance,weakly supervised regression,mixed weakly supervised learning,large-scale deep-level tasks and learning problems for open enviroment.

外文关键词：

weakly supervised scenariossupport vector machine(SVM)semi-supervised learningmultiple instance learningmulti-label learning

作者：

丁世飞、孙玉婷、梁志贞、郭丽丽、张健、徐晓

展开 >

作者单位：

中国矿业大学计算机科学与技术学院江苏徐州 221116

矿山数字化教育部工程研究中心(中国矿业大学) 江苏徐州 221116

关键词：

弱监督场景支持向量机半监督学习多示例学习多标记学习

基金：

国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金

项目编号：

62276265619762166220629762206296

出版年：

2024

DOI：

10.11897/SP.J.1016.2024.00987

计算机学报

中国计算机学会中国科学院计算技术研究所

计算机学报

CSTPCD北大核心

影响因子：3.18

ISSN：0254-4164

年,卷(期)：2024.47(5)

参考文献量9