基于NEON并行计算架构的矩阵乘法加速技术

Matrix Multiplication Acceleration Based on NEON Parallel Computing Architecture

祁俊雄 ¹程岳 ¹刘作龙 ¹韩伟 ¹潘妍 ¹李晨卉¹

扫码查看

作者信息

1. 航空工业西安航空计算技术研究所,陕西西安 710000
折叠

摘要

当今计算机的信号处理需求与日俱增.随着ARM体系结构的迅速发展,基于ARM架构的国产处理器迅速崛起,研究针对ARM平台的通用信号处理加速技术具有重要意义.通过分析ARMv8 架构以及NEON并行计算技术,以FT-2000/4(ARMv8 架构)为实验平台,研究典型 DSP函数库在ARMv8 架构上的优化加速.以矩阵运算为例,提出了基于NEON的通用矩阵乘法算法.实验结果表明所提算法在ARM架构上有显著的加速效果.为搭建针对ARM架构的全面且高效的通用信号处理库提供了技术支持.

Abstract

The demands for signal processing on computers are constantly increasing.With the rapid de-velopment of ARM architecture and the rapid rise of domestic processors based on ARM architecture,it is of great significance to investigate the general signal processing acceleration technology for the ARM plat-form.By analyzing the ARMv8 architecture and NEON technology,the FT-2000/4(ARMv8 architec-ture)is adopted as an experimental platform to examine the acceleration of the representative DSP library on the ARMv8 architecture.The matrix operation is taken as an example,in which a NEON-based gener-al matrix multiplication algorithm is proposed.Experimental results show that the acceleration of the pro-posed algorithm for the ARM architecture is significant.It can provide technical support for building a comprehensive and efficient general signal processing library for the ARM architecture.

关键词

通用信号处理/ARMv8/FT-2000/4/NEON/矩阵乘法

Key words

general signal processing/ARMv8/FT-2000/4/NEON/matrix multiplication

引用本文复制引用

基金项目

航空科学基金(2022Z071031001)

出版年

2024

航空计算技术

中国航空工业西安航空计算技术研究所

航空计算技术

CSTPCD

影响因子：0.316

ISSN：1671-654X

参考文献量8

段落导航