Exploration of the many-core data flow hardware architecture based on Actor model
The distributed training of ultra-large-scale AI models poses challenges to the communica-tion capability and scalability of chip architectures.Wafer-level chips integrate a large number of compu-ting cores and inter-connect networks on the same wafer,achieving ultra-high computing density and communication performance,making them an ideal choice for training ultra-large-scale AI models.AM-CoDA is a hardware architecture based on the Actor model,aiming to leverage the highly parallel,asyn-chronous message passing,and scalable characteristics of the Actor parallel programming model to a-chieve distributed training of AI models on wafer-level chips.The design of AMCoDA includes three levels:computational model,execution model,and hardware architecture.Experiments show that AMCoDA extensively supports various parallel patterns and collective communications in distributed training,flexibly and efficiently deploying and executing complex distributed training strategies.