大语言模型算法演进综述

Review of Evolution of Large Language Model Algorithms

朱炫鹏 ¹姚海东 ¹刘隽 ¹熊先奎¹

扫码查看

作者信息

1. 中兴通讯股份有限公司,中国深圳 518057
折叠

摘要

基于Transformer架构的大语言模型展现出强大的能力,是人类迈向通用人工智能(AGI)的一个重大进步.大语言模型架构和算法的演进分为提高推理效率、提高模型能力两条技术路线.介绍了两条技术路线主流的技术方案和思路.提高推理效率的方法有分布式推理、计算优化、访存优化、量化等;提高模型能力主要是引入新的架构,如混合专家(MoE)模型、状态空间模型(SSM)等.

Abstract

The large language model based on the Transformer architecture shows powerful capabilities,and it is a major progress towards artifi-cial general intelligence(AGI).The evolution of large language model architecture and algorithms is divided into two technical paths:improving the inference efficiency and model capability.The mainstream technical solutions and ideas for the two technical routes are described.Meth-ods for improving inference efficiency include distributed inference,computing optimization,memory access optimization,and quantification.To improve model capabilities,new architectures such as mixture of experts(MoE)and state space model(SSM)are introduced.

关键词

大语言模型/Transformer/注意力

Key words

large language model/Transformer/attention

引用本文复制引用

出版年

2024

中兴通讯技术

中兴通讯股份有限公司,安徽科学技术情报研究所

中兴通讯技术

CSTPCD北大核心

影响因子：1.272

ISSN：1009-6868

参考文献量59

段落导航