Achieving efficient training has become one of the key factors affecting the popularization of large model applications.The main technologies of efficient training of large models are analyzed and discussed according to the general training process of data preparation,dataloader,model initialization and evaluation,training parallelism,and model state preservation.In the face of the continuous growth of large model scale and the expansion of data processing types,there is still a large room for optimization of existing large model training tech-nologies.In the future,the key research directions of large model training include data-centric,intelligent dataloader and heterogeneous ac-celeration,customization in the field of network communication,training parallelism and automation.
large modeldata preparationdataloadermodel initializationmodel evaluationtraining parallelismtraining networkcheckpoint