基于分歧的核心数据集筛选算法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：随着深度学习的发展,运用于训练的数据集规模日益增大,导致深度神经网络训练的效率低下.针对这种情况,提出了基于分歧的核心数据集筛选算法,即在保证训练效果的情况下对原数据集进行精简得出核心数据集.算法使用迭代的方式以有监督学习方式进行学习,通过投票网络框架计算各数据的分歧值并以此排序进行筛选.对广泛使用的CI-FAR、Fashion-MNIST以及SVHN数据集进行核心数据集筛选实验,结果表明所提出的算法在得到核心数据集规模为原始规模五分之一的同时,其训练模型的精度仅下降不超过5%.同时,其筛选出的核心数据集的泛化误差仅为0.13,其泛用性更佳.

外文标题：An Efficient Core-set Selection Algorithm Based on Difference

外文摘要：With the development of deep learning,the scale of datasets is accumulating at an unprecedented speed,the pro-cess of training is inefficiency.It is usually necessary to simplify the original data set while ensuring similar training effect.In view of this,a core-set selection algorithm based on divergence is proposed.The algorithm uses the iterative method to learn in a supervised learning way,and calculates the divergence values of each data through the voting network framework,and then sorts them to select.The core-set selection experiments on CIFAR,Fashion-MNIST and SVHN datasets are carried out.The results show that the pro-posed algorithm can obtain a core-set size of one fifth of the original size,while the accuracy of the training model is only reduced by less than 5%.At the same time,the generalization error of the core dataset is only 0.13,which makes it more universal.

外文关键词：

convolutional neural networkcore set selectionsupervised learningactive learning

作者：

王纵驰、刘健、王培、赵兴博、于佳耕、陶青川

展开 >

作者单位：

中国航空油料集团有限公司北京 100088

航天神舟智慧系统技术有限公司北京 100029

四川大学电子信息学院成都 610065

中国科学院软件研究所北京 100190

展开 >

关键词：

卷积神经网络核心数据集筛选有监督学习主动学习

出版年：

2024

DOI：

10.3969/j.issn.1672-9722.2024.05.008

计算机与数字工程

中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD

影响因子：0.355

ISSN：1672-9722

年,卷(期)：2024.52(5)

参考文献量2