基于img2col的2D卷积算子在DCU加速器上的并行优化研究

扫码查看

原文链接

万方数据

中文摘要：深度学习中,因卷积巨大的计算需求,经常成为限制大型卷积神经网络性能的瓶颈,为此,提出使用并行技术来优化卷积运算的策略.对传统2D卷积算子进行重构,使其转换为通用矩阵乘法;使用共享内存和数据预取等技术,降低访存次数;针对加速器的硬件架构,调整算法的并行方案以提高计算性能.实验结果表明,相较传统的计算方式,该优化策略将运算速度提升了近7.5倍,提高了卷积运算效率.

外文标题：Parallel Optimization Research on 2D Convolution Operator Based on Img2col on DCU Accelerator

外文摘要：In deep learning,the immense computational demands of convolution operations often become a bottleneck for the performance of large convolutional neural networks.A strategy was proposed that lever-ages parallel techniques to optimize convolution operations.First,the traditional 2D convolution operator was reconstructed into a general matrix multiplication.Second,shared memory and data prefetching tech-niques were used to reduce memory access times.Finally,the parallel algorithm was adjusted to better suit the architecture of accelerators,thereby further improving computational performance.Experimental results show that this method achieves nearly a 7.5-fold increase in computation speed compared to tradi-tional implementations.This optimization strategy significantly enhances convolution operation efficiency.

外文关键词：

convolution operatorparallel computingoptimize

作者：

周全、李强、陶顺安、韩兆冰、卢璐

展开 >

作者单位：

青岛大学计算机科学技术学院,青岛 266071

关键词：

卷积算子并行计算优化

出版年：

2024

DOI：