高技术通讯(英文版)2024,Vol.30Issue(1) :52-60.DOI:10.3772/j.issn.1006-6748.2024.01.006

Cambricon-QR:a sparse and bitwise reproducible quantized training accelerator

LI Nan(李楠) ZHAO Yongwei ZHI Tian LIU Chang DU Zidong HU Xing LI Wei ZHANG Xishan LI Ling SUN Guangzhong
高技术通讯(英文版)2024,Vol.30Issue(1) :52-60.DOI:10.3772/j.issn.1006-6748.2024.01.006

Cambricon-QR:a sparse and bitwise reproducible quantized training accelerator

LI Nan(李楠) 1ZHAO Yongwei 2ZHI Tian 2LIU Chang 3DU Zidong 2HU Xing 2LI Wei 2ZHANG Xishan 3LI Ling 4SUN Guangzhong5
扫码查看

作者信息

  • 1. School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,P.R.China;State Key Laboratory of Processors,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100086,P.R.China;Cambricon Tech.Ltd,Beijing 100191,P.R.China
  • 2. State Key Laboratory of Processors,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100086,P.R.China
  • 3. State Key Laboratory of Processors,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100086,P.R.China;Cambricon Tech.Ltd,Beijing 100191,P.R.China
  • 4. Institute of Software,Chinese Academy of Sciences,Beijing 100086,P.R.China
  • 5. School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,P.R.China
  • 折叠

Abstract

Quantized training has been proven to be a prominent method to achieve deep neural network training under limited computational resources.It uses low bit-width arithmetics with a proper scal-ing factor to achieve negligible accuracy loss.Cambricon-Q is the ASIC design proposed to efficient-ly support quantized training,and achieves significant performance improvement.However,there are still two caveats in the design.First,Cambricon-Q with different hardware specifications may lead to different numerical errors,resulting in non-reproducible behaviors which may become a ma-jor concern in critical applications.Second,Cambricon-Q cannot leverage data sparsity,where con-siderable cycles could still be squeezed out.To address the caveats,the acceleration core of Cambri-con-Q is redesigned to support fine-grained irregular data processing.The new design not only ena-bles acceleration on sparse data,but also enables performing local dynamic quantization by contigu-ous value ranges(which is hardware independent),instead of contiguous addresses(which is de-pendent on hardware factors).Experimental results show that the accuracy loss of the method still keeps negligible,and the accelerator achieves 1.61×performance improvement over Cambricon-Q,with about 10%energy increase.

Key words

quantized training/sparse accelerator/Cambricon-QR

引用本文复制引用

基金项目

National Key Research and Devecopment Program of China(2022YFB4501601)

National Natural Science Foundation of China(62102398)

National Natural Science Foundation of China(U20A20227)

National Natural Science Foundation of China(62222214)

National Natural Science Foundation of China(62002338)

National Natural Science Foundation of China(U22A2028)

National Natural Science Foundation of China(U19B2019)

Chinese Academy of Sciences Project for Young Scientists in Basic Research(YSBR-029)

Youth Innovation Promotion Association Chinese Academy of Sciences()

出版年

2024
高技术通讯(英文版)
中国科学技术信息研究所(ISTIC)

高技术通讯(英文版)

影响因子:0.058
ISSN:1006-6748
参考文献量27
段落导航相关论文