基于Actor模型的众核数据流硬件架构探索

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：超大规模AI模型的分布式训练对芯片架构的通信能力和可扩展性提出了挑战。晶圆级芯片通过在同一片晶圆上集成大量的计算核心和互联网络，实现了超高的计算密度和通信性能，成为了训练超大规模AI模型的理想选择。AMCoDA是一种基于Actor模型的众核数据流硬件架构，旨在利用Ac-tor并行编程模型的高度并行性、异步消息传递和高扩展性等特点，在晶圆级芯片上实现AI模型的分布式训练。AMCoDA的设计包括计算模型、执行模型和硬件架构3个层面。实验表明，AMCoDA 能广泛支持分布式训练中的各种并行模式和集合通信模式，灵活高效地完成复杂分布式训练策略的部署和执行。

外文标题：Exploration of the many-core data flow hardware architecture based on Actor model

外文摘要：The distributed training of ultra-large-scale AI models poses challenges to the communica-tion capability and scalability of chip architectures.Wafer-level chips integrate a large number of compu-ting cores and inter-connect networks on the same wafer,achieving ultra-high computing density and communication performance,making them an ideal choice for training ultra-large-scale AI models.AM-CoDA is a hardware architecture based on the Actor model,aiming to leverage the highly parallel,asyn-chronous message passing,and scalable characteristics of the Actor parallel programming model to a-chieve distributed training of AI models on wafer-level chips.The design of AMCoDA includes three levels:computational model,execution model,and hardware architecture.Experiments show that AMCoDA extensively supports various parallel patterns and collective communications in distributed training,flexibly and efficiently deploying and executing complex distributed training strategies.

外文关键词：

wafer-level chipdistributed trainingActor modelmany-core dataflow architecture

作者：

张家豪、邓金易、尹首一、魏少军、胡杨

展开 >

作者单位：

清华大学集成电路学院,北京 100084

关键词：

晶圆级芯片分布式训练 Actor模型众核数据流架构

出版年：

2024

DOI：

10.3969/j.issn.1007-130X.2024.06.002

计算机工程与科学

国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心

影响因子：0.787

ISSN：1007-130X

年,卷(期)：2024.46(6)