首页|Attar:RRAM-based in-memory attention accelerator with software-hardware co-optimization

Attar:RRAM-based in-memory attention accelerator with software-hardware co-optimization

扫码查看
Attar:RRAM-based in-memory attention accelerator with software-hardware co-optimization
The attention mechanism has become a pivotal component in artificial intelligence,significantly enhancing the per-formance of deep learning applications.However,its quadratic computational complexity and intricate computations lead to substantial inefficiencies when processing long sequences.To address these challenges,we introduce Attar,a resistive random access memory(RRAM)-based in-memory accelerator designed to optimize attention mechanisms through software-hardware co-optimization.Attar leverages efficient Top-k pruning and quantization strategies to exploit the sparsity and redundancy of attention matrices,and incorporates an RRAM-based in-memory softmax engine by harnessing the versatility of the RRAM crossbar.Comprehensive evaluations demonstrate that Attar achieves a performance improvement of up to 4.88× and en-ergy saving of 55.38%over previous computing-in-memory(CIM)-based accelerators across various models and datasets while maintaining comparable accuracy.This work underscores the potential of in-memory computing to enhance the efficiency of attention-based models without compromising their effectiveness.

RRAMcomputing-in-memoryattentionpruningquantization

Bing LI、Ying QI、Ying WANG、Yinhe HAN

展开 >

Information Engineering College,Capital Normal University,Beijing 100048,China

Research Center for Intelligent Computing Systems,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China

RRAM computing-in-memory attention pruning quantization

2025

中国科学:信息科学(英文版)
中国科学院

中国科学:信息科学(英文版)

影响因子:0.715
ISSN:1674-733X
年,卷(期):2025.68(3)