大语言模型算法演进综述
Review of Evolution of Large Language Model Algorithms
朱炫鹏 1姚海东 1刘隽 1熊先奎1
作者信息
- 1. 中兴通讯股份有限公司,中国 深圳 518057
- 折叠
摘要
基于Transformer架构的大语言模型展现出强大的能力,是人类迈向通用人工智能(AGI)的一个重大进步.大语言模型架构和算法的演进分为提高推理效率、提高模型能力两条技术路线.介绍了两条技术路线主流的技术方案和思路.提高推理效率的方法有分布式推理、计算优化、访存优化、量化等;提高模型能力主要是引入新的架构,如混合专家(MoE)模型、状态空间模型(SSM)等.
Abstract
The large language model based on the Transformer architecture shows powerful capabilities,and it is a major progress towards artifi-cial general intelligence(AGI).The evolution of large language model architecture and algorithms is divided into two technical paths:improving the inference efficiency and model capability.The mainstream technical solutions and ideas for the two technical routes are described.Meth-ods for improving inference efficiency include distributed inference,computing optimization,memory access optimization,and quantification.To improve model capabilities,new architectures such as mixture of experts(MoE)and state space model(SSM)are introduced.
关键词
大语言模型/Transformer/注意力Key words
large language model/Transformer/attention引用本文复制引用
出版年
2024