Study on Distributed Training Optimization Based on Hybrid Parallel
Large-scale neural network training is a hot topic in the field of deep learning,and distributed training stands out as one of the most effective methods for training large neural networks across multiple nodes.Distributed training typically involves three parallel methods:data parallelism,inter-layer parallelism,and intra-layer parallelism.However,in existing frameworks,manual model partitioning is required for inter-layer parallelism,which increases the abstract complexity of model design.To ad-dress this issue,we propose a node-constrained relationship search algorithm that automates the model partitioning process.Moreover,in traditional data parallelism and inter-layer parallelism,strict serialization limits the overlap of computation and com-munication due to complex model constraints and the need for communication operations.To overcome this challenge,we intro-duce a synchronous optimization algorithm,enabling the overlap of computation and communication and effectively enhancing the overall training efficiency.The experiments involve training GPT-2 of different sizes,AlexNet,VGG16,and ResNet50 models.Using the synchronous optimization algorithm under a 6-node configuration,the training performance of GPT2-XL,GPT2-LARGE,and GPT2-MEDIUM models is improved,achieving speed-ups of 1.14,1.18,and 1.23,respectively.Under 1-node con-figuration,performance enhancements are also observed for AlexNet,VGG16,and ResNet50 models,with speed-ups of 1.31,1.14,and 1.03,respectively.The experimental results indicate that the synchronous optimization algorithm effectively enhances the training efficiency in mixed parallelism.