首页|基于异常检测的标签噪声过滤框架

基于异常检测的标签噪声过滤框架

扫码查看
噪声是影响机器学习模型可靠性的重要因素,而标签噪声相比特征噪声对模型训练更具决定性的影响.噪声过滤是处理标签噪声的一种有效方法,它不需要估计噪声率,也不需要依赖任何损失函数,然而 目前大多数标签噪声过滤算法都会面临过度清洗问题.针对此问题,文中提出了基于异常检测的标签噪声过滤框架,并在此框架下给出了一种自适应近邻聚类的标签噪声过滤算法AdNN(Label Noise Filtering via Adaptive Nearest Neighbor Clustering).该算法分别考虑分类问题中的每一个类别,把标签噪声检测问题转化成离群点检测问题,识别出每一个类别的离群点,然后根据相对密度去除离群点中的非噪声样本,得到噪声备选集,最后通过噪声因子对噪声备选集中的离群点进行噪声识别和过滤.实验结果表明,在合成数据集和公开数据集上,所提噪声过滤方法可以减轻过度清洗现象,同时能够得到很好的噪声过滤效果和分类预测性能.
Label Noise Filtering Framework Based on Outlier Detection
Noise is an important factor affecting the reliability of machine learning models,and label noise has more decisive in-fluence on model training than feature noise.Reducing label noise is a key step in classification tasks.Filtering noise is an effective way to deal with label noise,and it neither requires estimating the noise rate nor relies on any loss function.However,most filte-ring algorithms may cause overcleaning phenomenon.To solve this problem,a label noise filtering framework based on outlier de-tection is proposed firstly,and a label noise filtering algorithm via adaptive nearest neighbor clustering(AdNN)is then presented.AdNN transforms the label noise detection into the outlier detection problem.It considers samples in each category separately,and all outliers will be identified.Samples belong to outliers will be ignored according to relative density,and real label noise be-long to outliers will be found and removed by defined noise factor.Experiments on some synthetic and benchmark datasets show that the proposed noise filtering method can not only alleviate the overcleaning phenomenon,but also obtain good noise filtering effect and classification prediction performance.

Label noise filteringOutlier detectionAdaptive k-nearest neighborsRelative densityNoise factor

许茂龙、姜高霞、王文剑

展开 >

山西大学计算机与信息技术学院 太原 030006

计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006

标签噪声过滤 离群点检测 自适应k近邻 相对密度 噪声因子

国家自然科学基金国家自然科学基金国家自然科学基金山西省高等学校科技创新项目

U21A2051362076154619061132020L0007

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(2)
  • 37