首页|基于NEON并行计算架构的矩阵乘法加速技术

基于NEON并行计算架构的矩阵乘法加速技术

扫码查看
当今计算机的信号处理需求与日俱增.随着ARM体系结构的迅速发展,基于ARM架构的国产处理器迅速崛起,研究针对ARM平台的通用信号处理加速技术具有重要意义.通过分析ARMv8 架构以及NEON并行计算技术,以FT-2000/4(ARMv8 架构)为实验平台,研究典型 DSP函数库在ARMv8 架构上的优化加速.以矩阵运算为例,提出了基于NEON的通用矩阵乘法算法.实验结果表明所提算法在ARM架构上有显著的加速效果.为搭建针对ARM架构的全面且高效的通用信号处理库提供了技术支持.
Matrix Multiplication Acceleration Based on NEON Parallel Computing Architecture
The demands for signal processing on computers are constantly increasing.With the rapid de-velopment of ARM architecture and the rapid rise of domestic processors based on ARM architecture,it is of great significance to investigate the general signal processing acceleration technology for the ARM plat-form.By analyzing the ARMv8 architecture and NEON technology,the FT-2000/4(ARMv8 architec-ture)is adopted as an experimental platform to examine the acceleration of the representative DSP library on the ARMv8 architecture.The matrix operation is taken as an example,in which a NEON-based gener-al matrix multiplication algorithm is proposed.Experimental results show that the acceleration of the pro-posed algorithm for the ARM architecture is significant.It can provide technical support for building a comprehensive and efficient general signal processing library for the ARM architecture.

general signal processingARMv8FT-2000/4NEONmatrix multiplication

祁俊雄、程岳、刘作龙、韩伟、潘妍、李晨卉

展开 >

航空工业西安航空计算技术研究所,陕西 西安 710000

通用信号处理 ARMv8 FT-2000/4 NEON 矩阵乘法

航空科学基金

2022Z071031001

2024

航空计算技术
中国航空工业西安航空计算技术研究所

航空计算技术

CSTPCD
影响因子:0.316
ISSN:1671-654X
年,卷(期):2024.54(3)
  • 8