Attar:RRAM-based in-memory attention accelerator with software-hardware co-optimization

扫码查看

原文链接

万方数据
维普

外文标题：Attar:RRAM-based in-memory attention accelerator with software-hardware co-optimization

外文摘要：The attention mechanism has become a pivotal component in artificial intelligence,significantly enhancing the per-formance of deep learning applications.However,its quadratic computational complexity and intricate computations lead to substantial inefficiencies when processing long sequences.To address these challenges,we introduce Attar,a resistive random access memory(RRAM)-based in-memory accelerator designed to optimize attention mechanisms through software-hardware co-optimization.Attar leverages efficient Top-k pruning and quantization strategies to exploit the sparsity and redundancy of attention matrices,and incorporates an RRAM-based in-memory softmax engine by harnessing the versatility of the RRAM crossbar.Comprehensive evaluations demonstrate that Attar achieves a performance improvement of up to 4.88× and en-ergy saving of 55.38％over previous computing-in-memory(CIM)-based accelerators across various models and datasets while maintaining comparable accuracy.This work underscores the potential of in-memory computing to enhance the efficiency of attention-based models without compromising their effectiveness.

外文关键词：

RRAMcomputing-in-memoryattentionpruningquantization

作者：

Bing LI、Ying QI、Ying WANG、Yinhe HAN

展开 >

作者单位：

Information Engineering College,Capital Normal University,Beijing 100048,China

Research Center for Intelligent Computing Systems,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China

关键词：

RRAM computing-in-memory attention pruning quantization

出版年：

2025

DOI：

10.1007/s11432-024-4247-4

中国科学:信息科学(英文版)

中国科学院

中国科学:信息科学(英文版)

影响因子：0.715

ISSN：1674-733X

年,卷(期)：2025.68(3)