Heterogeneous Parallel Computing and Performance Optimization for DSMC/PIC Coupled Simulation Based on MPI+CUDA
DSMC/PIC coupled simulation is an important high-performance computing application that demands efficient parallel computing for large-scale simulations.Due to the dynamic injection and migration of particles,DSMC/PIC coupled simulations based on MPI parallelism often suffer from large communication overheads and are difficult to achieve load balancing.To address these issues,we design and implement efficient MPI+CUDA heterogeneous parallel algorithm based on the self-developed DSMC/PIC simulation software.Combining the characteristics of the GPU architecture and the DSMC/PIC computation,we con-duct a series of performance optimizations,including GPU memory access optimization,GPU thread workload optimization,CPU-GPU data transmission optimization,and DSMC/PIC data conflict optimization.We perform large-scale DSMC/PIC coupled he-terogeneous parallel simulations on NVIDIA V100 and A100 GPUs in the Beijing Beilong Super Cloud HPC system for the pulsed vacuum arc plasma jet application with billions of particles.Compared to the original pure MPI parallelism,the GPU heterogeneous parallelism significantly reduce simulation time,with a speedup of 550%on two GPU cards compared to 192 cores of the CPU,while maintaining better strong scalability.