Optimizing Distributed GMRES Algorithm with Mixed Precision
The generalized minimum residual(GMRES)method is an iterative method for solving sparse linear systems.It is broadly used in many areas like scientific and engineering computing.The exponential data growth makes the scale of problems solved by the GMRES algorithm expand rapidly.To support the solving of large-scale problems,researchers have implemented distributed GMRES algorithm on clusters.However,the current inter-node network still significantly lags behind intra-node fa-brics in terms of both bandwidth and latency,which greatly limits the performance of the distributed GMRES algorithm.This pa-per proposes a mixed-precision approach for optimizing the GMRES algorithm on GPU clusters,where the data transferred is re-presented in a low-precision format,the network traffic during inter-GPU communication is significantly reduced.In addition,this paper proposes a balancing algorithm that dynamically adjusts the precision of the data transferred to achieve the satisfied resi-dual.Experimental results show that the proposed method achieves an average speedup of 2.4×,and a further average speedup of 7.6× when combined with other optimizations.
Generalized minimum residualMixed precisionGPU clusterDistributed system