首页|噪声标签回归的泛化误差估计及过滤算法

噪声标签回归的泛化误差估计及过滤算法

扫码查看
当回归数据中存在数值型标签噪声时,传统泛化误差估计方法不再适用,回归模型的泛化性能缺乏保障.本文提出一种面向标签噪声的回归模型泛化误差估计方法,并设计了 自适应高斯核噪声估计与样本召回过滤(adaptive Gaussian kernel noise estimator and sample recall filtering,AGKSRF)算法.在所提 Craven-Wahba(CW)泛化误差估计的基础上,提出一种 CW 样本选择框架.基于最大后验估计思想和自适应近邻方法,提出标签噪声的自适应高斯核(adaptive Gaussian kernel,AGK)估计方法.结合所提框架,AGKSRF首先过滤大噪声样本,同时考虑到初次过滤时可能有部分干净样本被误删,AGKSRF根据模型在过滤样本上的误差对样本进行召回再过滤.标准数据集上的实验结果表明,AGKSRF降低模型误差的能力提升了 6~51个百分点.AGKSRF还可以识别年龄估计数据上的错误标签.因此,AGKSRF算法可以有效提升数据质量.
Generalization Error Estimation and Filtering Algorithm for Regression with Noisy Labels
When there exists numerical label noise in regression,traditional generalization error estimation methods become inapplica-ble.And it could rarely guarantee the generalization performance of the regression model.In this paper,a generalization error estima-tion method is proposed for the regression model on label noise,following which an adaptive Gaussian kernel noise estimator and sam-ple recall filtering(AGKSRF)algorithm is designed.Based on the proposed Craven-Wahba(CW)generalization error estimation,a CW sample selection framework is proposed.An adaptive Gaussian kernel(AGK)estimator of label noise is developed based on the idea of maximum a posteriori and adaptive nearest neighbor method.Besides,AGKSRF filters the large-noise samples by integrating the pro-posed framework.Considering that some clean samples might be wrongly removed in the first filtering,AGKSRF recalls the removed samples and filters kept samples again according to the model error.It can be concluded from the experimental results on benchmark datasets that AGKSRF reduces the model error by 6 to 51 percentage points.AGKSRF also can identify the erroneous labels on the age estimation dataset.It can be concluded that AGKSRF can effectively improve the data quality.

regression with noisy labelsgeneralization error estimationadaptive Gaussian kernel estimatorsample recall filtering

姜高霞、李政莹、王文剑

展开 >

山西大学计算机与信息技术学院,太原 030006

山西大学计算智能与中文信息处理教育部重点实验室,太原 030006

噪声标签回归 泛化误差估计 自适应高斯核估计 样本召回过滤

2025

小型微型计算机系统
中国科学院沈阳计算技术研究所

小型微型计算机系统

北大核心
影响因子:0.564
ISSN:1000-1220
年,卷(期):2025.46(1)