Revisiting Test Sample Selection for CNN Under Model Calibration
Deep neural networks are widely used in various tasks,and model testing is crucial to ensure their quality.Test sample selection can solve the issue of labor-intensive manual labeling by strategically choosing a small set of data to label.However,existing selection metrics based on predictive uncertainty neglect the accuracy of the estimation of predictive uncertainty.To fill the gaps of the above studies,we conduct a systematic empirical study on 3 widely used datasets and 4 convolutional neural net-works(CNN)to reveal the relationship between model calibration and predictive uncertainty metrics used in test sample selec-tion.We then compare the quality of the test subset selected by calibrated and uncalibrated models.The findings indicate a degree of correlation between uncertainty metrics and model calibration in CNN models.Moreover,CNN models with better calibration select higher-quality test subsets than models with poor calibration.Specifically,the calibrated model outperforms the uncalibrated model in detecting misclassified samples in 70.57%of the experiments.Our study emphasizes the importance of considering mo-del calibration in test selection and highlights the potential benefits of using a calibrated model to improve the adequacy of the tes-ting process.