中兴通讯技术2024,Vol.30Issue(2) :50-55.DOI:10.12142/ZTETJ.202402008

生成式大模型承载网络架构与关键技术探索

Network Architecture and Technologies for Large Generative Models

唐宏 武娟 徐晓青 张宁
中兴通讯技术2024,Vol.30Issue(2) :50-55.DOI:10.12142/ZTETJ.202402008

生成式大模型承载网络架构与关键技术探索

Network Architecture and Technologies for Large Generative Models

唐宏 1武娟 1徐晓青 1张宁1
扫码查看

作者信息

  • 1. 中国电信股份有限公司研究院,中国 广州 510630
  • 折叠

摘要

生成式大模型训练需要超大规模低时延、高带宽、高可用的网络承载底座.对生成式大模型下高性能网络基础设施的技术发展路线和实现方案进行了研究,认为商用部署时需针对不同训练阶段的工作负载和流量模式,开展定制化网络架构设计和传输协议优化.流控/拥塞控制技术、负载均衡技术、自动化运维技术和面向广域远程直接内存访问(RDMA)的确定性网络传输技术是未来的重点研究方向.

Abstract

The training of large generative models has posed demands for ultra-large-scale,low latency,high bandwidth,and high-availability network infrastructure.The technological development roadmap and implementation schemes of high-performance network in-frastructure for large models are investigated.It is believed that the customized network architecture design and transport protocol optimiza-tion should be carried out based on workloads and traffic patterns at different training stages during commercial deployment.Flow control/congestion control technologies,load balancing technologies,automated operation and maintenance solutions,and deterministic network transmission technologies for wide-area remote direct memory access(RDMA)are key research directions for the future.

关键词

生成式大模型/RDMA/网络拥塞控制/网络负载均衡

Key words

large generative model/RDMA/network congestion control/network load balancing

引用本文复制引用

出版年

2024
中兴通讯技术
中兴通讯股份有限公司,安徽科学技术情报研究所

中兴通讯技术

CSTPCD北大核心
影响因子:1.272
ISSN:1009-6868
参考文献量21
段落导航相关论文