基于稀疏判别图与自步学习的半监督降维
Semi-Supervised Dimensionality Reduction Based on Sparse Discriminant Graph and Self-Paced Learning
古楠楠 1邢梦洁 1林鹏 1陈海宝2
作者信息
- 1. 首都经济贸易大学统计与数据科学学院,北京 100070
- 2. 上海交通大学集成电路学院(信息与电子工程学院),上海 200240
- 折叠
摘要
基于图的半监督降维是一类利用数据结构图来处理半监督降维问题的方法.然而,目前大多数此类算法使用的结构图仅关注数据信息而忽略类标签信息;且在训练过程中并未考虑样本之间的差异性,这降低了算法在噪声或异常值情况下的稳健性.对此,文章结合稀疏表示与自步学习,提出一种利用稀疏判别图求得线性降维映射函数的自步学习器.首先,该方法在数据的稀疏表示的基础上,融入了类别标签的传播,以此构建稀疏判别图;然后考虑低维数据与其类锚点之间的距离,以及低维数据保持原高维数据的判别性稀疏结构的能力,并在此基础上构造了相应的自步学习降维问题.一方面,所提方法构建了稀疏判别图,能更有效提取数据中蕴含的判别信息;另一方面,所提方法基于自步学习机制,可以自动计算训练数据的重要度值,抑制不可靠数据或标签的负面影响,提高模型对噪声或异常值的鲁棒性.5个实验数据集的结果验证了所提算法的有效性.
Abstract
Semi-supervised graph-based dimensionality reduction is a kind of meth-od that utilizes data structure graph to deal with semi-supervised dimensionality reduction problem.However,most of these algorithms only take account of data in-formation while ignore class label information;And they don't take account of the differences among samples in the training process,which reduces the robustness of the algorithms in the case of noise or outliers.In this paper,by combining sparse repre-sentation with self-paced learning,a self-paced learner is proposed to obtain the linear dimensionality reduction mapping based on sparse discriminant graph.In detail,the proposed method firstly constructs a sparse discriminant graph by integrating the propagation of class labels with sparse representation of data.Then,by considering the distance between each low-dimensional data point and the corresponding class anchor,and the ability of low-dimensional data to maintain the discriminative sparse structure of the original high-dimensional data,this paper proposes a self-paced learn-ing problem for dimensionality reduction.On the one hand,the proposed method constructs a sparse discriminant graph that can extract the discriminative informa-tion of data more effectively;On the other hand,the proposed method is based on self-paced learning mechanism,which makes it can automatically calculate the im-portance values of training data,suppress the negative impact of unreliable data or labels,and improve the robustness of the model to noise or outliers.The results of five experimental data sets demonstrate the effectiveness of the proposed algorithm.
关键词
半监督降维/稀疏表示/自步学习/基于图的半监督学习Key words
Semi-supervised dimension reduction/sparse representation/self-paced learning/graph-based semi-supervised learning引用本文复制引用
出版年
2025