面向大模型预训练的智算网络技术研究
Research on intelligent computing network technology for large-scale pre-trained models
王学聪 1冀思伟 1李聪1
作者信息
- 1. 中国电信股份有限公司研究院,北京 102209
- 折叠
摘要
随着人工智能的发展,大规模预训练模型在自然语言处理和计算机视觉等领域都取得了显著成果,促进了智算中心的建设.针对面向大模型预训练的智算网络关键技术展开研究,系统梳理了智算网络国内外最新的标准化进展,提出了一种面向智算网络的目标架构,探讨了智算网络关键技术的原理,包括远程直接内存访问(RDMA)、IB(InfiniBand)、基于以太网的RDMA(RoCE)、集合通信等,同时也分析了智算网络目前存在的问题以及未来的发展趋势,在推动智算网络技术发展、指导智算中心建设等方面具有重要意义.
Abstract
With the development of artificial intelligence,significant achievements are made in various fields such as natural language processing and computer vision through the utilization of large-scale pre-trained models,which pro-motes the construction of intelligent computing centers.Key technologies related to large-scale pre-trained models in intelligent computing networks were studied.The latest standardization progress of intelligent computing network at home and abroad was systematically reviewed.A target architecture for intelligent computing network was proposed,and the principles of key technologies,including remote direct memory access(RDMA),IB,RoCE,and collective communication,were explored.Moreover,the current issues and future development trends of intelligent computing networks were analyzed.This research holds crucial importance in advancing the development of intelligent comput-ing network technology and providing guidance for the establishment of intelligent computing centers.
关键词
智算网络/远程直接内存访问/大模型Key words
intelligent computing network/RDMA/large-scale model引用本文复制引用
基金项目
国家重点研发计划项目(2023YFB2904100)
出版年
2024