基于存算一体集成芯片的大语言模型专用硬件架构
Large Language Model Specific Hardware Architecture Based on Integrated Compute-in-Memory Chips
何斯琪 1穆琛 1陈迟晓1
作者信息
摘要
目前以ChatGPT为代表的人工智能(AI)大模型在参数规模和系统算力需求上呈现指数级的增长趋势.深入研究了大型模型专用硬件架构,详细分析了大模型在部署过程中面临的带宽问题,以及这些问题对当前数据中心的重大影响.提出采用存算一体集成芯片架构的解决方案,旨在缓解数据传输压力,同时提高大模型推理的能量效率.此外,还深入研究了在存算一体架构下轻量化-存内压缩协同设计的可能性,以实现稀疏网络在存算一体硬件上的稠密映射,从而显著提高存储密度和计算能效.
Abstract
Artificial intelligent(AI)models represented by ChatGPT are showing an exponential growth trend in parameter size and system com-puting power requirements.The dedicated hardware architecture for large models is studied,and a detailed analysis of the bandwidth bottle-neck issues faced by large models during deployment is provided,as well as the significant impact of this challenge on current data centers.To address this issue,a solution of using integrated compute-in-memory chiplets has been proposed,aiming to alleviate data transmission pres-sure and improve the energy efficiency of large-scale model inference.In addition,the possibility of lightweight in-memory compression col-laborative design under the in-memory computing architecture is studied,in order to achieve dense mapping of sparse networks on the inte-grated in-memory computing architecture hardware,thereby significantly improving storage density and computational energy efficiency.
关键词
大语言模型/存算一体/集成芯粒/存内压缩Key words
large language model/compute-in-memory/chiplet/in-memory compression引用本文复制引用
基金项目
国家自然科学基金(62322404)
复旦大学-中兴通讯强计算架构研究联合实验室存算一体架构研究项目()
出版年
2024