首页|基于自监督的主动标签清洗

基于自监督的主动标签清洗

Self-supervised active label cleaning

扫码查看
主动标签清洗利用主动学习来进行标签噪声处理,以降低人工标注成本.现有的主动标签清洗方法仍然存在人工额外标注成本较高的问题,即挑选出的可疑样本中正确样本所占比例较高.为了缓解这一问题,提出了一种基于核心集的自监督主动标签清洗方法.首先利用自监督任务进行表征学习,随后将数据映射到特征空间中,并利用贪婪的K-Center集合覆盖方法挑选出可疑样本,最后根据不确定性筛选出标签噪声样本进行重标注.并同时考虑到了样本的代表性与不确定性,能够有效降低可疑样本中正确样本的比例.在含有不同比例标签噪声的公开数据集上的实验结果表明,在各迭代轮次中明显地降低了人工额外标注成本,同时也在一定程度上缓解了冷启动问题.此外,还通过消融实验证明了方法中自监督核心集采样模块和不确定性预测模块的有效性.
Active label cleaning utilizes the active learning method for label noise processing to lower the cost of manual annotation.However,the existing active label cleaning methods still suffer from high cost of extra manual annotation,particularly due to a high proportion of correctly labeled samples among the selected suspicious ones.To address this problem,a self-supervised active label cleaning method based on core-set was proposed.Firstly,self-supervised tasks were employed for representation learning of all samples,followed by mapping the samples to a future space.Suspicious samples were then identified using a greedy K-Center set covering method,and label noise samples were selected for re-labeling based on uncertainty.By considering both the representativeness and uncertainty of samples,this method could effectively lower the proportion of correct samples in suspicious ones.Experimental results on public datasets with varying proportions of label noise demonstrated that the proposed method could significantly reduce the cost of extra manual annotation in each iteration,while also mitigating the cold start problem to some extent.Additionally,the effectiveness of the self-supervised core-set sampling module and the uncertainty prediction module in this method were validated through ablation experiments.

active learningself-supervised learninglabel noiselabel cleaningcost of extra manual annotation

林晓、张秋阳、郑晓妹、杨启哲

展开 >

上海师范大学信息与机电工程学院,上海 200234

上海师范大学上海智能教育大数据工程技术研究中心,上海 200234

上海市中小学在线教育研究基地,上海 200234

主动学习 自监督学习 标签噪声 标签清洗 人工额外标注成本

上海市促进产业高质量发展专项

2211106

2024

图学学报
中国图学学会

图学学报

CSTPCD北大核心
影响因子:0.73
ISSN:2095-302X
年,卷(期):2024.45(3)
  • 23