首页|基于K-means聚类和特征空间增强的噪声标签深度学习算法

基于K-means聚类和特征空间增强的噪声标签深度学习算法

扫码查看
深度学习中神经网络的性能依赖于高质量的样本,然而噪声标签会降低网络的分类准确率.为降低噪声标签对网络性能的影响,噪声标签学习算法被提出.该算法首先将训练样本集划分成干净样本集和噪声样本集,然后使用半监督学习算法对噪声样本集赋予伪标签.然而,错误的伪标签以及训练样本数量不足的问题仍然限制着噪声标签学习算法性能的提升.为解决上述问题,提出基于K-means聚类和特征空间增强的噪声标签深度学习算法.首先,该算法利用K-means聚类算法对干净样本集进行标签聚类,并根据噪声样本集与聚类中心的距离大小筛选出难以分类的噪声样本,以提高训练样本的质量;其次,使用mixup算法扩充干净样本集和噪声样本集,以增加训练样本的数量;最后,采用特征空间增强算法抑制mixup算法新生成的噪声样本,从而提高网络的分类准确率.并在CIFAR10、CIFAR100、MNIST和ANIMAL-10 共 4 个数据集上试验验证了该算法的有效性.
A noisy label deep learning algorithm based on K-means clustering and feature space augmentation
The performance of neural networks in deep learning relies on high-quality samples.However,the presence of noisy labels reduces the classification accuracy of the network.To reduce the impact of noisy labels,we propose a learning algorithm that categorizes training samples into clean and noisy subsets,assigning pseudo-labels to the noisy samples using a semisupervised learning algorithm.Despite these measures,the performance of the noisy label learning algorithm can be hindered by inaccurate pseudo-labels and a lack of sufficient training samples.To address the afore-mentioned problems,we propose a noisy label deep learning algorithm that leverages K-means clustering and feature space augmentation.First,the algorithm applies the K-means clustering algorithm to cluster the clean samples based on their labels.It then selects noisy samples that are difficult to classify according to the distance between the noisy samples and the cluster center.This process enhances the quality of the training samples.Second,the mix-up algorithm is used to expand both the clean and noisy samples,thereby increasing the number of training samples.Finally,a feature space augmentation algorithm is used to suppress the noise samples generated by the mix-up algorithm,leading to im-proved network classification accuracy.The effectiveness of the proposed algorithm has been validated on four data sets:CIFAR10,CIFAR100,MNIST,and ANIMAL-10.

noisy label learningdeep learningsemisupervised learningmachine learningneural networkK-means clusteringfeature space augmentationmix-up algorithm

吕佳、邱小龙

展开 >

重庆师范大学计算机与信息科学学院,重庆 401331

重庆市数字农业服务工程技术研究中心,重庆 401331

噪声标签学习 深度学习 半监督学习 机器学习 神经网络 K-means聚类 特征空间增强 mixup算法

国家自然科学基金重大项目重庆市教委"成渝地区双城经济圈建设"科技创新项目重庆市高等学校创新研究群体资助项目重庆市教委科研重点项目

11991024KJCX2020024CXQT20015KJZD-K202200511

2024

智能系统学报
中国人工智能学会 哈尔滨工程大学

智能系统学报

CSTPCD北大核心
影响因子:0.672
ISSN:1673-4785
年,卷(期):2024.19(2)
  • 31