一种节省资源的矩阵运算单元硬件微架构设计

Design of hardware microarchitecture of resource-efficient matrix operation unit

扫码查看

原文链接

万方数据
维普

中文摘要：为了实现人工智能和高性能计算在不同应用领域下的快速运算,需借助人工智能加速器(NPU)或者通用图形处理器(GPGPU)对其进行加速.由于矩阵运算是人工智能和高性能计算的核心运算,文中提出一种节省资源的矩阵运算单元架构的实现方案.通过对矩阵运算单元中每个子运算单元中的乘法器和加法器数量进行扩展,并将输入数据按行列广播到矩阵运算单元上的各个子运算单元可实现对矩阵运算的加速.通过利用PE矩阵之间的数据共享,采用新型的PE矩阵互联方案,可达到在减少带宽资源的同时提升算力的目的.与现有NPU或GPGPU的矩阵运算实现方案相比,所提方案使用更少的加法器和寄存器即可实现相同的算力,且在更低的时钟延迟和带宽消耗下即可完成对相同规模矩阵运算的加速.

外文摘要：It is necessary to use artificial intelligence accelerator NPU(neural processing unit)or GPGPU(general-purpose graphics processing unit)for acceleration,so as to realize the fast computation of artificial intelligence and high performance com-puting in different fields.Since the matrix operation is the core operation of artificial intelligence and high performance computing,an implementation scheme of resource-efficient matrix operation unit architecture is proposed.By expanding the number of multi-pliers and adders in each sub-unit of matrix arithmetic unit and broadcasting the input data to each sub-unit of matrix arithmetic unit by row and column,the acceleration of matrix arithmetic unit can be realized.By using the data sharing between PE matrix and adopting the new PE matrix interconnection scheme,the purpose of reducing bandwidth resources and increasing computing power can be achieved.In comparison with the existing implementation scheme of matrix operation of NPU or GPGPU,the pro-posed one can achieve the same computing power with fewer adders and registers,and can complete the acceleration of the same scale matrix operation with low clock latency and bandwidth consumption.

外文关键词：

artificial intelligencehigh performance computingmatrix operationresource-efficientlow clock latencyGPGPU

作者：

潘于、田映辉、张伟、杨建磊、申奇

展开 >

作者单位：

海光信息技术股份有限公司,北京 100193

北京航空航天大学,北京 100191

中国联通智能城市研究院,北京 100037

关键词：

人工智能高性能计算矩阵运算节省资源低时钟延迟 GPGPU

出版年：

2024

DOI：

10.16652/j.issn.1004-373x.2024.05.028

现代电子技术

陕西电子杂志社

现代电子技术

CSTPCD北大核心

影响因子：0.417

ISSN：1004-373X

年,卷(期)：2024.47(5)

参考文献量16