Parallel Optimization Research on 2D Convolution Operator Based on Img2col on DCU Accelerator
In deep learning,the immense computational demands of convolution operations often become a bottleneck for the performance of large convolutional neural networks.A strategy was proposed that lever-ages parallel techniques to optimize convolution operations.First,the traditional 2D convolution operator was reconstructed into a general matrix multiplication.Second,shared memory and data prefetching tech-niques were used to reduce memory access times.Finally,the parallel algorithm was adjusted to better suit the architecture of accelerators,thereby further improving computational performance.Experimental results show that this method achieves nearly a 7.5-fold increase in computation speed compared to tradi-tional implementations.This optimization strategy significantly enhances convolution operation efficiency.