A Research and Design of Reconfigurable CNN Co-Processor for Edge Computing
With the development of Deep Learning, the number of parameters and computation of Convolutional Neural Network (CNN) increases dramatically, which greatly raises the cost of deploying CNN algorithms on edge devices. To reduce the difficulty of the deployment and decrease the inference latency and energy consumption of CNN on the edge side, a Reconfigurable CNN Co-Processor for edge computing is proposed. Based on the data flow pattern of channel-wise processing, the proposed two-level distributed storage scheme solves the problem of power consumption overhead and performance degradation caused by large data movement between PE units and large-scale migration of intermediate data on chip. To avoid the complex data interconnection network propagation mechanism in PE arrays and reduce the complexity of control, a flexible local access mechanism and a padding mechanism based on address translation are proposed, which can perform conventional convolution, deep separable convolution, pooling and fully connected operations with great flexibility. The proposed co-processor contains 256 Processing Elements (PEs) and 176 kB on-chip SRAM. Synthesized and post-layout with 55-nm TT Corner CMOS process (25 °C,1.2 V), the CNN co-processor achieves a maximum clock frequency of 328 MHz and an area of 4.41 mm2. The peak performance of the co-processor is 163.8 GOPs at 320 MHz and the area efficiency is 37.14 GOPs/mm2, the energy efficiency of LeNet-5 and MobileNet are 210.7 GOPs/W and 340.08 GOPs/W, respectively, which is able to meet the energy-efficiency and performance requirements of edge intelligent computing scenarios.