Compression of Deep Neural Networks based on quantized tensor decomposition to implement on reconfigurable hardware platforms

Nekooei, Amirreza ¹Safari, Saeed¹

扫码查看

作者信息

1. Univ Tehran
折叠

Abstract

Deep Neural Networks (DNNs) have been vastly and successfully employed in various artificial intelligence and machine learning applications (e.g., image processing and natural language processing). As DNNs become deeper and enclose more filters per layer, they incur high computational costs and large memory consumption to preserve their large number of parameters. Moreover, present processing platforms (e.g., CPU, GPU, and FPGA) have not enough internal memory, and hence external memory storage is needed. Hence deploying DNNs on mobile applications is difficult, considering the limited storage space, computation power, energy supply, and real-time processing requirements. In this work, using a method based on tensor decomposition, network parameters were compressed, thereby reducing access to external memory. This compression method decomposes the network layers' weight tensor into a limited number of principal vectors such that (i) almost all the initial parameters can be retrieved, (ii) the network structure did not change, and (iii) the network quality after reproducing the parameters was almost similar to the original network in terms of detection accuracy. To optimize the realization of this method on FPGA, the tensor decomposition algorithm was modified while its convergence was not affected, and the reproduction of network parameters on FPGA was straightforward. The proposed algorithm reduced the parameters of ResNet50, VGG16, and VGG19 networks trained with Cifar10 and Cifar100 by almost 10 times. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.

Key words

Deep Neural Network/Tensor decomposition/Principal vectors/Network weights/External memory/CLASSIFICATION

引用本文复制引用

出版年

2022

Neural Networks

EISCI

ISSN：0893-6080

被引量6

参考文献量51

段落导航