MRNDA:一种基于资源受限片上网络的深度神经网络加速器组播机制研究
MRNDA:A Multicast Mechanism for Resource-Constrained Noc-Based Deep Neural Network Accelerators
欧阳一鸣 1王奇 2汤飞扬 1周武 1李建华1
作者信息
- 1. 合肥工业大学计算机与信息学院,安徽合肥 230009
- 2. 合肥工业大学微电子学院,安徽合肥 230009
- 折叠
摘要
片上网络(Network-on-Chip,NoC)在多处理器系统中得到了广泛的应用.近年来,有研究提出了基于NoC的深度神经网络(Deep Neural Network,DNN)加速器.基于NoC的DNN加速器设计利用NoC连接神经元计算设备,能够极大地减少加速器对片外存储的访问从而减少加速器的分类延迟和功耗.但是,若采用传统的单播NoC,大量的一对多数据包会极大的提高加速器的通信延迟.并且,目前的深度神经网络规模往往非常庞大,而NoC的核心数量是有限的.因此,文中提出了一种针对资源受限的NoC的组播方案.该方案利用有限数量的处理单元(Processor El-ement,PE)来计算大型的DNN,并且利用特殊的树形组播加速网络来减少加速器的通信延迟.仿真结果表明,和基准情况相比,本文提出的组播机制使加速器的分类延迟最高降低了86.7%,通信延迟最高降低了88.8%,而它的路由器面积和功耗仅占基准路由器的9.5%和10.3%.
Abstract
Network-on-Chip(NoC)devices have been widely used in multiprocessor systems.In recent years,NoC-based deep neural network(DNN)accelerators have been proposed to connect neural computing devices using NoCs.Such designs dramatically reduce off-chip memory accesses of these platforms thus reduce the accelerators'classification latency and power consumption.However,the large number of one-to-many packet transfers significantly increase the communica-tion latency with traditional unicast channels.We proposed a multicast mechanism for resource-constrained noc-based deep neural network accelerators(MRNDA)to compute large DNN models by using limited number of processor elements(PEs).This paper proposes a tree-based multicast acceleration network to decrease the communication latency of DNN ac-celerators.Simulation results show that,compared with the baseline method,the multicast mechanism proposed in this pa-per reduces the classification latency of the accelerator by up to 86.7%and the communication latency by up to 88.8%,while its router's area and power only account for 9.5%and 10.3%of the baseline routers.
关键词
片上网络/深度神经网络加速器/组播/路由器架构/多物理网络Key words
network-on-chip/deep neural network accelerator/multicast/router architecture/multiple network引用本文复制引用
基金项目
国家自然科学基金(61874157)
国家自然科学基金(71971151)
出版年
2024