首页|Attar:RRAM-based in-memory attention accelerator with software-hardware co-optimization
Attar:RRAM-based in-memory attention accelerator with software-hardware co-optimization
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
万方数据
维普
Attar:RRAM-based in-memory attention accelerator with software-hardware co-optimization
The attention mechanism has become a pivotal component in artificial intelligence,significantly enhancing the per-formance of deep learning applications.However,its quadratic computational complexity and intricate computations lead to substantial inefficiencies when processing long sequences.To address these challenges,we introduce Attar,a resistive random access memory(RRAM)-based in-memory accelerator designed to optimize attention mechanisms through software-hardware co-optimization.Attar leverages efficient Top-k pruning and quantization strategies to exploit the sparsity and redundancy of attention matrices,and incorporates an RRAM-based in-memory softmax engine by harnessing the versatility of the RRAM crossbar.Comprehensive evaluations demonstrate that Attar achieves a performance improvement of up to 4.88× and en-ergy saving of 55.38%over previous computing-in-memory(CIM)-based accelerators across various models and datasets while maintaining comparable accuracy.This work underscores the potential of in-memory computing to enhance the efficiency of attention-based models without compromising their effectiveness.