面向大模型的智算网络发展研究
Research on the development of intelligent computing network for large models
郭亮 1王少鹏 2权伟 3李洁2
作者信息
- 1. 中国信息通信研究院云计算与大数据研究所,北京 100191;北京交通大学电子信息工程学院,北京 100044
- 2. 中国信息通信研究院云计算与大数据研究所,北京 100191
- 3. 北京交通大学电子信息工程学院,北京 100044
- 折叠
摘要
近年来,全球进入智能计算的蓬勃发展期,作为具有巨量参数和复杂结构的深度学习模型,大模型训练需要在多卡、多服务器间实现训练参数的快速同步,所以对算力中心网络的带宽、时延、可靠性、可扩展性和安全性等提出更高要求.研究了面向大模型训练的智算网络的需求和相关关键技术,对智算网络的研究成果、标准规范和案例实践进行了分析,以期进一步促进智算网络的发展.
Abstract
In recent years,the world has entered a period of vigorous development in intelligent computing.As deep learning models with huge parameters and complex structures,large model training requires fast synchronization of training parameters between multiple cards and servers,which imposes higher requirements on the bandwidth,la-tency,reliability,scalability and security of datacenter networks.The requirements and related key technologies of in-telligent computing networks for large model training were studied,and the standard specifications,academic re-search,and case practices of intelligent computing networks were analyzed,in order to promote the development of intelligent computing networks.
关键词
大模型/智算中心/网络技术Key words
large model/intelligent computing center/network technology引用本文复制引用
基金项目
新一代人工智能国家科技重大项目(2021ZD0113003)
出版年
2024