噪声标签回归的泛化误差估计及过滤算法
Generalization Error Estimation and Filtering Algorithm for Regression with Noisy Labels
姜高霞 1李政莹 1王文剑2
作者信息
- 1. 山西大学计算机与信息技术学院,太原 030006
- 2. 山西大学计算机与信息技术学院,太原 030006;山西大学计算智能与中文信息处理教育部重点实验室,太原 030006
- 折叠
摘要
当回归数据中存在数值型标签噪声时,传统泛化误差估计方法不再适用,回归模型的泛化性能缺乏保障.本文提出一种面向标签噪声的回归模型泛化误差估计方法,并设计了 自适应高斯核噪声估计与样本召回过滤(adaptive Gaussian kernel noise estimator and sample recall filtering,AGKSRF)算法.在所提 Craven-Wahba(CW)泛化误差估计的基础上,提出一种 CW 样本选择框架.基于最大后验估计思想和自适应近邻方法,提出标签噪声的自适应高斯核(adaptive Gaussian kernel,AGK)估计方法.结合所提框架,AGKSRF首先过滤大噪声样本,同时考虑到初次过滤时可能有部分干净样本被误删,AGKSRF根据模型在过滤样本上的误差对样本进行召回再过滤.标准数据集上的实验结果表明,AGKSRF降低模型误差的能力提升了 6~51个百分点.AGKSRF还可以识别年龄估计数据上的错误标签.因此,AGKSRF算法可以有效提升数据质量.
Abstract
When there exists numerical label noise in regression,traditional generalization error estimation methods become inapplica-ble.And it could rarely guarantee the generalization performance of the regression model.In this paper,a generalization error estima-tion method is proposed for the regression model on label noise,following which an adaptive Gaussian kernel noise estimator and sam-ple recall filtering(AGKSRF)algorithm is designed.Based on the proposed Craven-Wahba(CW)generalization error estimation,a CW sample selection framework is proposed.An adaptive Gaussian kernel(AGK)estimator of label noise is developed based on the idea of maximum a posteriori and adaptive nearest neighbor method.Besides,AGKSRF filters the large-noise samples by integrating the pro-posed framework.Considering that some clean samples might be wrongly removed in the first filtering,AGKSRF recalls the removed samples and filters kept samples again according to the model error.It can be concluded from the experimental results on benchmark datasets that AGKSRF reduces the model error by 6 to 51 percentage points.AGKSRF also can identify the erroneous labels on the age estimation dataset.It can be concluded that AGKSRF can effectively improve the data quality.
关键词
噪声标签回归/泛化误差估计/自适应高斯核估计/样本召回过滤Key words
regression with noisy labels/generalization error estimation/adaptive Gaussian kernel estimator/sample recall filtering引用本文复制引用
出版年
2025