基于权重复用的训练加速算法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：深度学习已广泛应用在科研教学、工业生产等各领域，但因其数据量庞大和模型结构复杂，在模型训练阶段要依赖大量的计算资源。为了能在实验教学环节提升资源利用效率，让学生更加熟练地掌握数据搜集和模型参数调整优化的能力，提出了一种基于权重复用的训练加速方法，分别对VGG和ResNet网络结构的深度和宽度进行伸缩拓展，允许模型复用结构相似但不需要完全一致的网络权重。实验结果表明，在CIFAR10 数据集上测试，采用权重复用方法进行初始化的训练更快收敛，而且在训练结束时与随机化训练的准确率相近，实现拓展后的网络加速训练，是一种更加灵活的知识迁移方法，有助于培养学生对复杂模型的理解与优化能力。

外文标题：Training acceleration algorithm based on weight reuse

外文摘要：[Objective]Deep learning has been widely applied in various fields,including scientific research,teaching,and industrial production.However,due to the large amount of data and complex model structure,it relies on a large amount of computing resources during the model training stage.The knowledge transfer method that reuses the weights of pretrained models has been widely used in the fields of computer vision and natural language processing.For example,when training a detection network on the VOC or COCO dataset,the pretrained classification network of the ImageNet dataset is used as the backbone network to perform further training.On the one hand,reusing the weights trained on similar datasets helps improve the performance of the target task.On the other hand,this can also accelerate the training process.To improve resource utilization efficiency in experimental teaching and to allow students to become more proficient in data collection and model parameter adjustment and optimization,a weight reuse-based training acceleration method is proposed.[Methods]Common weight reuse methods often require a high degree of structural consistency between the pretrained network and the target network,which limits the expansion of the network when exploring a suitable network structure.In this paper,a more flexible method of knowledge transfer is proposed,which allows the network to reuse the weights of the other whose structure is similar but not completely consistent.Training an expanded network using our method is much faster than training from scratch.The algorithm expands the depth and width of the VGG and ResNet network structures,respectively,allowing models to reuse network weights that are similar in structure but not completely consistent.The network exploration scheme of the proposed weight reuse method differs from that of the knowledge distillation-based scheme.It directly transforms the weights of the previously explored network to initialize the new network rather than training from scratch.Due to the lack of guidance from teacher networks,there will be no additional time or space expenses.[Results]In the width expansion and depth expansion experiments,the training curves initialized using the proposed weight reuse method are clearly located on the left side of the training curve for random initialization training,and their performance is quite similar or even better at the end of training.The experimental results reveal that when applied to the CIFAR10 dataset,the training using the weight reuse method for initialization converges faster,and the accuracy at the end of the training is similar to that of randomized training,achieving the goal of accelerating the training of the expanded network.It is a more flexible knowledge transfer method that helps students focus on the understanding and optimization ability of complex models.[Conclusions]The proposed knowledge transfer method for reusing network pretraining weight transfers knowledge from small networks to large networks,which can effectively accelerate the training speed and facilitate the iterative expansion of network size during the design and validation stages with strong flexibility.Training using this method for initialization converges faster and has an accuracy similar to randomized training at the end of the training.During the experiment,the need for students to search for GPU computing resources after class is solved,which reduces the waiting time for model training.This is beneficial for students to deepen their understanding of key scientific issues in instrument system design and to improve their comprehensive innovation ability.Therefore,the developed weight reuse algorithm has certain theoretical and practical value and can be used for teaching deep learning experimental courses,effectively improving resource utilization efficiency and course learning progress.

外文关键词：

convolutional neural networkknowledge transfertraining accelerationweight reuse

作者：

应仰威、章洛铭、齐炜、郑楷、周泓

展开 >

作者单位：

浙江大学生物医学工程与仪器科学学院,浙江杭州 310027

关键词：

卷积神经网络知识迁移训练加速权重复用

基金：

国家重点研发计划教育部产学合作协同育人项目

项目编号：

2022YFC3602601220600656141412

出版年：

2024

DOI：

10.16791/j.cnki.sjg.2024.05.003

实验技术与管理

清华大学

实验技术与管理

CSTPCD北大核心

影响因子：1.651

ISSN：1002-4956

年,卷(期)：2024.41(5)