首页|基于img2col的2D卷积算子在DCU加速器上的并行优化研究

基于img2col的2D卷积算子在DCU加速器上的并行优化研究

扫码查看
深度学习中,因卷积巨大的计算需求,经常成为限制大型卷积神经网络性能的瓶颈,为此,提出使用并行技术来优化卷积运算的策略.对传统2D卷积算子进行重构,使其转换为通用矩阵乘法;使用共享内存和数据预取等技术,降低访存次数;针对加速器的硬件架构,调整算法的并行方案以提高计算性能.实验结果表明,相较传统的计算方式,该优化策略将运算速度提升了近7.5倍,提高了卷积运算效率.
Parallel Optimization Research on 2D Convolution Operator Based on Img2col on DCU Accelerator
In deep learning,the immense computational demands of convolution operations often become a bottleneck for the performance of large convolutional neural networks.A strategy was proposed that lever-ages parallel techniques to optimize convolution operations.First,the traditional 2D convolution operator was reconstructed into a general matrix multiplication.Second,shared memory and data prefetching tech-niques were used to reduce memory access times.Finally,the parallel algorithm was adjusted to better suit the architecture of accelerators,thereby further improving computational performance.Experimental results show that this method achieves nearly a 7.5-fold increase in computation speed compared to tradi-tional implementations.This optimization strategy significantly enhances convolution operation efficiency.

convolution operatorparallel computingoptimize

周全、李强、陶顺安、韩兆冰、卢璐

展开 >

青岛大学计算机科学技术学院,青岛 266071

卷积算子 并行计算 优化

2024

青岛大学学报(自然科学版)
青岛大学

青岛大学学报(自然科学版)

影响因子:0.248
ISSN:1006-1037
年,卷(期):2024.37(4)