Training acceleration algorithm based on weight reuse
[Objective]Deep learning has been widely applied in various fields,including scientific research,teaching,and industrial production.However,due to the large amount of data and complex model structure,it relies on a large amount of computing resources during the model training stage.The knowledge transfer method that reuses the weights of pretrained models has been widely used in the fields of computer vision and natural language processing.For example,when training a detection network on the VOC or COCO dataset,the pretrained classification network of the ImageNet dataset is used as the backbone network to perform further training.On the one hand,reusing the weights trained on similar datasets helps improve the performance of the target task.On the other hand,this can also accelerate the training process.To improve resource utilization efficiency in experimental teaching and to allow students to become more proficient in data collection and model parameter adjustment and optimization,a weight reuse-based training acceleration method is proposed.[Methods]Common weight reuse methods often require a high degree of structural consistency between the pretrained network and the target network,which limits the expansion of the network when exploring a suitable network structure.In this paper,a more flexible method of knowledge transfer is proposed,which allows the network to reuse the weights of the other whose structure is similar but not completely consistent.Training an expanded network using our method is much faster than training from scratch.The algorithm expands the depth and width of the VGG and ResNet network structures,respectively,allowing models to reuse network weights that are similar in structure but not completely consistent.The network exploration scheme of the proposed weight reuse method differs from that of the knowledge distillation-based scheme.It directly transforms the weights of the previously explored network to initialize the new network rather than training from scratch.Due to the lack of guidance from teacher networks,there will be no additional time or space expenses.[Results]In the width expansion and depth expansion experiments,the training curves initialized using the proposed weight reuse method are clearly located on the left side of the training curve for random initialization training,and their performance is quite similar or even better at the end of training.The experimental results reveal that when applied to the CIFAR10 dataset,the training using the weight reuse method for initialization converges faster,and the accuracy at the end of the training is similar to that of randomized training,achieving the goal of accelerating the training of the expanded network.It is a more flexible knowledge transfer method that helps students focus on the understanding and optimization ability of complex models.[Conclusions]The proposed knowledge transfer method for reusing network pretraining weight transfers knowledge from small networks to large networks,which can effectively accelerate the training speed and facilitate the iterative expansion of network size during the design and validation stages with strong flexibility.Training using this method for initialization converges faster and has an accuracy similar to randomized training at the end of the training.During the experiment,the need for students to search for GPU computing resources after class is solved,which reduces the waiting time for model training.This is beneficial for students to deepen their understanding of key scientific issues in instrument system design and to improve their comprehensive innovation ability.Therefore,the developed weight reuse algorithm has certain theoretical and practical value and can be used for teaching deep learning experimental courses,effectively improving resource utilization efficiency and course learning progress.