首页|基于便笺式存储器的向量化SpMV算法的性能评估与分析

基于便笺式存储器的向量化SpMV算法的性能评估与分析

扫码查看
便笺式存储器是一种结构简单、访问延迟固定且软件可直接控制的片上高速存储,在现代处理器设计中得到了广泛应用.稀疏矩阵向量乘SpMV是高性能计算、人工智能等应用领域重要的内核计算函数之一.在传统多级Cache处理器中,SpMV算法计算过程中对稠密输入向量的不规则访问操作会导致大量Cache访问请求失效,从而影响SpMV算法执行效率.为了评估便笺式存储器对SpMV向量算法的性能影响,使用ARM SVE指令对基于CSR格式的SpMV算法向量化,并将算法中的热点数据即稠密输入向量存储在便笺式存储器中,在集成了便笺式存储器的ARM架构处理器中对SpMV向量算法进行了性能分析.在gem5模拟器中针对来自真实应用程序的2 562个稀疏矩阵进行了实验.实验结果表明,集成了便笺式存储器的处理器与传统多级Cache处理器相比,针对向量化SpMV算法能够实现的最大加速比为7.45,平均加速比为1.11.
Performance evaluation and analysis of vectorized SpMV algorithm based on scratchpad memory
Scratchpad memory(SPM),as an on-chip high-speed memory with a simple structure,fixed access latency,and direct software control,has been widely used in modern processor design.Sparse matrix vector multiplication(SpMV)is one of the critical kernel computation functions in high performance computing,artificial intelligence,and other application domains.In traditional multi-level cache processors,the irregular access operations of dense input vectors during the computation of the SpMV algorithm often lead to a significant number of cache misses,thereby affecting the execution effi-ciency of the SpMV algorithm.To evaluate the performance impact of scratchpad memory on the SpMV vector algorithm,this paper utilizes ARM's scalable vector extension(SVE)instructions to vectorize the SpMV algorithm based on the compressed sparse row(CSR)format.It stores the hot data,namely the dense input vectors,in the scratchpad memory and conducts a performance analysis of the SpMV vector algorithm on ARM-based processors integrated with scratchpad memory.This paper conducts experiments on 2 562 sparse matrices from real-world applications using the gem5 simulator.The exper-imental results show that,compared to traditional processor architectures,running the SpMV vector algorithm on the processor architecture integrated with scratchpad memory can achieve a maximum speedup of 7.45 times and an average speedup of 1.11 times.

sparse matrix vector multiplicationscratchpad memorycompressed sparse row(CSR)ARM scalable vector extension(SVE)

张宗茂、董德尊、王子聪、常俊胜、张晓云、王绍聪

展开 >

国防科技大学计算机学院,湖南长沙 410073

稀疏矩阵向量乘 便笺式存储器 CSR ARM SVE

2024

计算机工程与科学
国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心
影响因子:0.787
ISSN:1007-130X
年,卷(期):2024.46(9)