To improve the resource utilization efficiency and the speed of the lightweight convolutional neural network in the hard-ware platform,based on the idea of software and hardware co-optimization,a lightweight convolutional neural networks accelera-tor based on FPGA was proposed,and a special hardware architecture was designed according to the characteristics of the net-work structure.Combined with multi-level parallel strategy,a unified convolutional computing unit was designed.Moreover,a differentiable threshold-based selective shift quantization method was proposed to reduce the storage cost and improve the throughput of the accelerator,which enabled the computational unit to perform computations in a hardware-friendly form.As revealed from the experimental results,the MobileNetV2 accelerator deployed on the Arria 10 FPGA platform can achieve 311 fps,which is about 9.3 times faster than the CPU version and about 3 times faster than the GPU version.In terms of throughput,it can achieve 98.62 GOPS.