Survey on Support Vector Machine Algorithms in Weakly Supervised Scenarios
Support Vector Machine(SVM)is a statistical learning method based on the principle of minimizing structural risk.It provides an intuitive geometric interpretation and rigorous math-ematical derivation,showing the unique advantages in handling nonlinear,few shot,and high dimensional problems.SVM has garnered significant attention and widely applied in various fields such as image recognition,fault diagnosis,and text classification.SVM is a classical supervised machine learning algorithm designed to train the learner using samples with complete,unique,and unambiguous ground-truth labels to ensure the generalization ability.However,as real-world application tasks become increasingly complex,creating such a sample set is laborious and difficult.On the one hand,it requires a significant amount of time and cost for data collection,cleaning,and debugging.For specific domains,especially in the medical field,experts often need to combine domain knowledge to process and label the samples.On the other hand,learning tasks in the real world often undergo changes and evolution.For example,data annotation criteria,annota-tion granularity,or downstream use cases may frequently change,requiring the re-labeling of sam-ples.Consequently,a large amount of samples in real-world applications lack complete and unambig-uous labels for the high cost of sample labeling.Moreover,samples in most practical task scenari-os may exhibit polysemous,that is,a sample can be associated with multiple labels at the same time.Therefore,standard SVM struggles to achieve satisfactory performance in weakly supervised scenarios such as incomplete supervision,inexact supervision,and polysemous supervision.Weakly supervised scenarios are contrasted with supervised scenarios.Unlike the latter,learning algorithms in weakly supervised scenarios are designed to train the learner using samples that may be limited,ambiguous,or only roughly labeled.From the perspective of weakly supervised sce-narios,this survey systematically reviews the current research status and development of SVM algorithms.Firstly,the concept of weakly supervised scenarios and the basic mathematical prin-ciple of SVM are briefly introduced.Secondly,the existing SVM algorithms in weakly supervised scenarios are divided into three categories according to different learning paradigms,namely,the semi-supervised learning based methods,the multiple instance learning based methods,and the multi-label learning based methods.Specifically,the semi-supervised learning based methods can be further subdivided into clustering assumption based approaches and manifold assumption based approaches according to data assumptions.The multiple instance learning based methods can be further classified into instance level based approaches,bag level based approaches and embedded space based approaches according to problem solutions.The multi-label learning based methods can be further refined into problem transformation based approaches and algorithm adaptation based approaches according to processing ideas.This survey provides a detailed introduction to the repre-sentative methods within these categories,summarizes and analyzes their characteristics and short-comings,offering a basis for selecting different SVM methods in various task scenarios.After that,the performance of some representative algorithms is evaluated and analyzed by carefully conducting experiments on publicly available datasets.Finally,potential research directions for the future development of SVM algorithms in weakly supervised scenarios are discussed,such as data imbalance,weakly supervised regression,mixed weakly supervised learning,large-scale deep-level tasks and learning problems for open enviroment.