中国科学F辑2024,Vol.54Issue(8) :1827-1842.DOI:10.1360/SSI-2023-0345

面向忆阻器存内计算架构的高能效编解码机制

A general yet accurate approach for energy-efficient processing-in-memory architecture computations

黄禹 郑龙 刘海峰 邱启航 辛杰 廖小飞 金海
中国科学F辑2024,Vol.54Issue(8) :1827-1842.DOI:10.1360/SSI-2023-0345

面向忆阻器存内计算架构的高能效编解码机制

A general yet accurate approach for energy-efficient processing-in-memory architecture computations

黄禹 1郑龙 1刘海峰 1邱启航 2辛杰 3廖小飞 3金海3
扫码查看

作者信息

  • 1. 华中科技大学大数据技术与系统国家地方联合工程研究中心,武汉 430074;华中科技大学服务计算技术与系统教育部重点实验室,武汉 430074;华中科技大学集群与网格计算湖北省重点实验室,武汉 430074;华中科技大学计算机科学与技术学院,武汉 430074;之江实验室,杭州 311121
  • 2. 华中科技大学计算机科学与技术学院,武汉 430074
  • 3. 华中科技大学大数据技术与系统国家地方联合工程研究中心,武汉 430074;华中科技大学服务计算技术与系统教育部重点实验室,武汉 430074;华中科技大学集群与网格计算湖北省重点实验室,武汉 430074;华中科技大学计算机科学与技术学院,武汉 430074
  • 折叠

摘要

近年来,以忆阻器为代表的存内计算架构被广泛研究,用于加速各种应用,并有望突破冯.诺伊曼(von Neumann)架构面临的内存墙瓶颈.本文观察到忆阻器计算操作的能源消耗存在不对称性,即在低电阻状态下对忆阻器单元的操作能耗可能比在高电阻状态下高出数个数量级.这为通过减少低电阻状态单元的数量来节省计算能源提供了机会.为此,本文提出了一套通用且高效的忆阻器编解码机制,可以无缝集成到现有加速器中,并且不会影响其计算结果.在编码部分,设计了一个基于减法的编码器,实现了低电阻状态到高电阻状态的编码转换,并将编码问题表述为图遍历问题以实现最优的编码结果.在解码部分,配备了一个轻量级的硬件解码器,用于还原编码的计算结果,并且避免引入额外的计算时间开销.实验结果显示,本方案在机器学习和图计算等多个领域取得不俗效果,分别实现了高达31.3%和56.0%的能源节约.

Abstract

Resistive random-access memory(ReRAM)is promising to break the memory wall due to its processing-in-memory capability and is widely studied to accelerate various applications.The energy consumption of ReRAM-based accelerators stems mainly from ADC/DACs and computational operations on ReRAM crossbars.The former has been adequately studied in recent years,and a new bottleneck of energy consumption has been shifted to ReRAM operations.In this paper,we observe the asymmetry of energy consumption for ReRAM operations,that the energy of operating upon the low resistance state(LRS)ReRAM cell can be several orders of magnitude higher than that on the high resistance state(HRS)ReRAM cell.This opens an opportunity for saving computational energy by reducing the number of LRS cells.To end this,we propose a general energy-efficient ReRAM-based computation scheme that can be seamlessly integrated into any existing ReRAM-based accelerators without affecting its computation results.The key insight lies in reducing the LRS cells by converting them into HRS.It implements the LRS-HRS encoding through a subtraction-based encoder,representing the encoding problem as a graph traversal problem to achieve optimized results.It is also equipped with a lightweight hardware-based decoder to restore the encoded computation results.We have evaluated our approach across graph processing and neural networks on the ReRAM-based accelerators,and the results show that our approach achieves up to 31%and 56.0%energy savings,respectively.

关键词

存内计算/忆阻器/加速器/高能效/机器学习/图计算

Key words

processing in memory/memristor/accelerator/energy efficiency/machine learning/graph processing

引用本文复制引用

基金项目

国家重点研发计划(2023YFB4503400)

出版年

2024
中国科学F辑
中国科学院,国家自然科学基金委员会

中国科学F辑

CSTPCD北大核心
影响因子:1.438
ISSN:1674-5973
参考文献量2
段落导航相关论文