首页|DBiT: A High-Precision Binarized ViT FPGA Accelerator

DBiT: A High-Precision Binarized ViT FPGA Accelerator

扫码查看
Vision Transformer (ViT) has shown great promise in image processing. However, its large model parameters and computation complexity result in inference delays, making deployment on edge devices challenging. To overcome these issues, various model compression techniques like quantization and distillation have been developed. Previous studies have explored quantization and binarization of ViT, but their effectiveness in minimizing accuracy loss has been limited, and primarily focusing on software solutions. Research on hardware acceleration remains underexplored but is essential for boosting the inference speed of binarized networks. This paper proposes a hardware acceleration scheme for binarized ViT using a distribution matching layer. Our approach starts with an experimental and theoretical analysis of the data distribution in binarized ViT, leading to the introduction of a distribution matching layer post-binarization. We also design a compatible model storage scheme and a hardware acceleration algorithm to enhance the efficiency of weight matrix storage and computation. Additionally, optimizing large matrix multiplication within the self-attention layer significantly improves overall model speed. Experimental results show that our method increases accuracy by 10% compared to traditional binarized ViT approaches with learning factors, reducing the accuracy gap between binarized and full-precision models to 4%. Furthermore, our approach achieves inference speeds approximately 45 times faster than traditional models.

Binary ViTDistribution matching layerFPGA hardware accelerator

Jun Gong、Wei Tao、Li Tian、Yongxin Zhu、Hui Wang

展开 >

Shanghai Advanced Research Institute, Chinese Academy of Science, Shanghai 201210, PR China||University of Chinese Academy of Science, Beijing 100049, PR China

2025

Journal of signal processing systems for signal, image, and video technology
  • 39