Gloo+:Accelerating distributed training of deep learning using in-network computing
In distributed deep learning training,collective communication is the main communication method.In the research of collective communication optimization,there are software-level optimization and hardware-level optimization.SHARP is a collective communication network offload protocol pro-posed by Mellanox.It is optimized for collective communication in hardware.It offloads collective ope-rations to switches in the network,thereby shortening the collective communication time.We integrated SHARP technology on the basis of Gloo,and designed and implemented a collective communication library-Gloo+that can accelerate distributed deep learning training by using in-network computing.Our experimental evaluation of Gloo+ shows that in the benchmark test,when the message size is small,the acceleration ratio of Gloo+ relative to Gloo can reach up to 100 or more.While compared to MPI in Ethernet mode,the acceleration ratio can also reach up to 50 or more.While compared to MPI in IB mode,the acceleration ratio is within 10.In the practical application of distributed deep learning train-ing,the acceleration ratio of Gloo+can reach a maximum of 1.1 compared to Gloo,1.3 compared to MPI in Ethernet mode,and 0.5 compared to MPI in IB mode.
distributed deep learningcollective communicationin-network computingGlooSHARP