A scalable parallel structured matrix multiplication algorithm framework
Structured matrices play an important role in scientific computing and engineering applica-tions,such as Cauchy,Toeplitz,Vandermonde,and Hankel matrices.Although these matrices are dense,they can be expressed with only O(n)parameters(generators),where n is the dimension of the matrix.The core idea of the algorithm in this paper is to use matrix generators to explicitly construct lo-cal matrix blocks of each process,thereby reducing communication overhead.Additionally,by levera-ging the numerical low-rank property of these matrix blocks.This paper further minimize computational overhead.Consequently,the proposed parallel structured matrix multiplication algorithm framework can simultaneously reduce both computational and communication costs,making it suitable for matrix multiplication algorithms like Cannon,Fox,and PUMMA.Extensive numerical tests were conducted on the Tianhe-2 supercomputer,and the results demonstrate that the proposed algorithm achieves an 8.96 × speedup compared to the PDGEMM function in ScaLAPACK.