Time Cost Model and Optimal Configuration Method for GPU Parallel Computation of Matrix Multiplication
Horizontal matrix &vertical matrix multiplication(HVM)is one of the fundamental calculations in scientific compu-ting and engineering,as it largely affects the computational efficiency of higher-levet algorithms.GPU parallel computing has be-come one of the mainstream parallel computing method,and its underlying design makes it highly suitable for large-scale multipli-cation calculations.Numerous studies have focused on designing matrix structures and optimizing matrix multiplication using GPU parallel computing frameworks.However,there has been a lack of GPU parallet algorithms and optimization methods spe-cifically targeting HVM.Furthermore,the configuration of GPU kernel functions directly affects computational efficiency,but studies on the optimal configuration of kernel functions have been extremely limited,typically requiring researchers to heuristi-cally set them based on the specific computational characteristics of the algorithm.This paper designs a parallel HVM algorithm,PHVM,based on the GPU's thread and memory model.The numerical experimental results show that when the horizontal di-mension of the left matrix is much larger than the vertical dimension,PHVM significantly outperforms the general matrix multi-plication in the NVIDIA cuBLAS library.Furthermore,this paper establishes an optimal theoretical model for kernel function configuration of PHVM runtime based on GPU hardware parameters.The numerical experimental results indicates that this theo-retical model accurately reflects the trend of changes in PHVM algorithm runtime with kernel function configuration(grid size and thread block size)variations.
Matrix multiplicationGPUCUDAKernel function configuration