Active label cleaning utilizes the active learning method for label noise processing to lower the cost of manual annotation.However,the existing active label cleaning methods still suffer from high cost of extra manual annotation,particularly due to a high proportion of correctly labeled samples among the selected suspicious ones.To address this problem,a self-supervised active label cleaning method based on core-set was proposed.Firstly,self-supervised tasks were employed for representation learning of all samples,followed by mapping the samples to a future space.Suspicious samples were then identified using a greedy K-Center set covering method,and label noise samples were selected for re-labeling based on uncertainty.By considering both the representativeness and uncertainty of samples,this method could effectively lower the proportion of correct samples in suspicious ones.Experimental results on public datasets with varying proportions of label noise demonstrated that the proposed method could significantly reduce the cost of extra manual annotation in each iteration,while also mitigating the cold start problem to some extent.Additionally,the effectiveness of the self-supervised core-set sampling module and the uncertainty prediction module in this method were validated through ablation experiments.
active learningself-supervised learninglabel noiselabel cleaningcost of extra manual annotation