首页|多核数字信号处理卷积算法并行优化

多核数字信号处理卷积算法并行优化

Parallel optimization of convolution algorithm on multi-core DSP

扫码查看
针对国防科技大学自主研发的异构多核数字信号处理(digital signal processing,DSP)芯片的特征以及卷积算法自身特点,提出了一种面向多核DSP架构的高性能多核并行卷积实现方案.针对1×1 卷积提出了特征图级多核并行方案;针对卷积核大于1 的卷积提出了窗口级多核并行优化设计,同时提出了逐元素向量化计算的核内并行优化实现.实验结果表明,所提并行优化方法实现单核计算效率最高能达到64.95%,在带宽受限情况下,多核并行扩展效率可达到48.36%~88.52%,在典型网络ResNet50 上的执行性能与E5-2640 CPU相比,获得了5.39 倍性能加速.
According to the characteristics of the heterogeneous multi-core DSP(digital signal processing)chip independently developed by National University of Defense Technology and the characteristics of the convolution algorithm,a high-performance multi-core parallel convolution implementation scheme for multi-core DSP architecture was proposed.A feature graph level multi-core parallel scheme is proposed for 1×1 convolution.For convolutions with kernels larger than 1,a window level multi-core parallel optimization design was proposed,and an element-wise vectorization based intra-core parallel optimization implementation was proposed.The experimental results show that the proposed parallel optimization method can reach a maximum single core computing efficiency of 64.95%.When the bandwidth is limited,the parallel expansion efficiency of multi-core can still reach 48.36%~88.52%.Compared with E5-2640 CPU,the execution performance on the typical network ResNet50 achieves 5.39x performance acceleration.

multi-core DSPCNNsconvolutional algorithmsparallel optimization

许金伟、王庆林、李娅琳、姜晶菲、高蕾、李荣春、李东升

展开 >

国防科技大学 计算机学院,湖南 长沙 410073

国防科技大学 并行与分布计算全国重点实验室,湖南 长沙 410073

多核DSP 卷积神经网络 卷积算法 并行优化

国家自然科学基金

61732018

2024

国防科技大学学报
国防科学技术大学

国防科技大学学报

CSTPCD北大核心
影响因子:0.517
ISSN:1001-2486
年,卷(期):2024.46(1)
  • 1