Generalization Error Estimation and Filtering Algorithm for Regression with Noisy Labels
When there exists numerical label noise in regression,traditional generalization error estimation methods become inapplica-ble.And it could rarely guarantee the generalization performance of the regression model.In this paper,a generalization error estima-tion method is proposed for the regression model on label noise,following which an adaptive Gaussian kernel noise estimator and sam-ple recall filtering(AGKSRF)algorithm is designed.Based on the proposed Craven-Wahba(CW)generalization error estimation,a CW sample selection framework is proposed.An adaptive Gaussian kernel(AGK)estimator of label noise is developed based on the idea of maximum a posteriori and adaptive nearest neighbor method.Besides,AGKSRF filters the large-noise samples by integrating the pro-posed framework.Considering that some clean samples might be wrongly removed in the first filtering,AGKSRF recalls the removed samples and filters kept samples again according to the model error.It can be concluded from the experimental results on benchmark datasets that AGKSRF reduces the model error by 6 to 51 percentage points.AGKSRF also can identify the erroneous labels on the age estimation dataset.It can be concluded that AGKSRF can effectively improve the data quality.