中兴通讯技术2024,Vol.30Issue(2) :37-42.DOI:10.12142/ZTETJ.202402006

基于存算一体集成芯片的大语言模型专用硬件架构

Large Language Model Specific Hardware Architecture Based on Integrated Compute-in-Memory Chips

何斯琪 穆琛 陈迟晓
中兴通讯技术2024,Vol.30Issue(2) :37-42.DOI:10.12142/ZTETJ.202402006

基于存算一体集成芯片的大语言模型专用硬件架构

Large Language Model Specific Hardware Architecture Based on Integrated Compute-in-Memory Chips

何斯琪 1穆琛 1陈迟晓1
扫码查看

作者信息

  • 1. 复旦大学,中国 上海 200433
  • 折叠

摘要

目前以ChatGPT为代表的人工智能(AI)大模型在参数规模和系统算力需求上呈现指数级的增长趋势.深入研究了大型模型专用硬件架构,详细分析了大模型在部署过程中面临的带宽问题,以及这些问题对当前数据中心的重大影响.提出采用存算一体集成芯片架构的解决方案,旨在缓解数据传输压力,同时提高大模型推理的能量效率.此外,还深入研究了在存算一体架构下轻量化-存内压缩协同设计的可能性,以实现稀疏网络在存算一体硬件上的稠密映射,从而显著提高存储密度和计算能效.

Abstract

Artificial intelligent(AI)models represented by ChatGPT are showing an exponential growth trend in parameter size and system com-puting power requirements.The dedicated hardware architecture for large models is studied,and a detailed analysis of the bandwidth bottle-neck issues faced by large models during deployment is provided,as well as the significant impact of this challenge on current data centers.To address this issue,a solution of using integrated compute-in-memory chiplets has been proposed,aiming to alleviate data transmission pres-sure and improve the energy efficiency of large-scale model inference.In addition,the possibility of lightweight in-memory compression col-laborative design under the in-memory computing architecture is studied,in order to achieve dense mapping of sparse networks on the inte-grated in-memory computing architecture hardware,thereby significantly improving storage density and computational energy efficiency.

关键词

大语言模型/存算一体/集成芯粒/存内压缩

Key words

large language model/compute-in-memory/chiplet/in-memory compression

引用本文复制引用

基金项目

国家自然科学基金(62322404)

复旦大学-中兴通讯强计算架构研究联合实验室存算一体架构研究项目()

出版年

2024
中兴通讯技术
中兴通讯股份有限公司,安徽科学技术情报研究所

中兴通讯技术

CSTPCD北大核心
影响因子:1.272
ISSN:1009-6868
参考文献量8
段落导航相关论文