SMCA:A Framework for Scaling Chiplet-Based Computing-in-Memory Accelerators
Computing-in-Memory(CiM)architectures based on Resistive Random Access Memory(ReRAM)have been recognized as a promising solution to accelerate deep learning applications.As intelligent applications continue to evolve,deep learning models become larger and larger,which imposes higher demands on the computational and storage resources on processing platforms.However,due to the non-idealism of ReRAM,large-scale ReRAM-based computing systems face severe challenges of low yield and reliability.Chiplet-based architectures assemble multiple small chiplets into a single package,providing higher fabrication yield and lower manufacturing costs,which has become a primary trend in chip design.However,compared to on-chip wiring,the expensive inter-chiplet communication becomes a performance bottleneck for chiplet-based systems which limits the chip's scalability.As the countermeasure,a novel scaling framework for chiplet-based CiM accelerators,SMCA(SMT-based CiM chiplet Acceleration)is proposed in this paper.This framework comprises an adaptive deep learning task partition strategy and an automated SMT-based workload deployment to generate the most energy-efficient DNN workload scheduling strategy with the minimum data transmission on chiplet-based deep learning accelerators,achieving effective improvement in system performance and efficiency.Experimental results show that compared to existing strategies,the SMCA-generated automatically schedule strategy can reduce the energy costs of inter-chiplet communication by 35%.