智能系统学报2024,Vol.19Issue(1) :36-57.DOI:10.11992/tis.202306042

神经网络压缩联合优化方法的研究综述

An overview of the joint optimization method for neural network compression

宁欣 赵文尧 宗易昕 张玉贵 陈灏 周琦 马骏骁
智能系统学报2024,Vol.19Issue(1) :36-57.DOI:10.11992/tis.202306042

神经网络压缩联合优化方法的研究综述

An overview of the joint optimization method for neural network compression

宁欣 1赵文尧 2宗易昕 3张玉贵 1陈灏 4周琦 1马骏骁1
扫码查看

作者信息

  • 1. 中国科学院 半导体研究所, 北京 100083
  • 2. 合肥工业大学 微电子学院, 安徽 合肥 230009
  • 3. 中国科学院 前沿科学与教育局, 北京 100864
  • 4. 南开大学 人工智能学院, 天津 300071
  • 折叠

摘要

随着人工智能应用的实时性、隐私性和安全性需求增大,在边缘计算平台上部署高性能的神经网络成为研究热点.由于常见的边缘计算平台在存储、算力、功耗上均存在限制,因此深度神经网络的端侧部署仍然是一个巨大的挑战.目前,克服上述挑战的一个思路是对现有的神经网络压缩以适配设备部署条件.现阶段常用的模型压缩算法有剪枝、量化、知识蒸馏,多种方法优势互补同时联合压缩可实现更好的压缩加速效果,正成为研究的热点.本文首先对常用的模型压缩算法进行简要概述,然后总结了"知识蒸馏+剪枝"、"知识蒸馏+量化"和"剪枝+量化"3 种常见的联合压缩算法,重点分析论述了联合压缩的基本思想和方法,最后提出了神经网络压缩联合优化方法未来的重点发展方向.

Abstract

With the increasing demand for real-time,privacy and security of AI applications,deploying high-perform-ance neural network on an edge computing platform has become a research hotspot.Since common edge computing platforms have limitations in storage,computing power,and power consumption,the edge deployment of deep neural networks is still a huge challenge.Currently,one method to overcome the challenges is to compress the existing neural network to adapt to the device deployment conditions.The commonly used model compression algorithms include prun-ing,quantization,and knowledge distillation.By taking advantage of complementary multiple methods,the combined compression can achieve better compression acceleration effect,which is becoming a hot spot in research.This paper first makes a brief overview of the commonly used model compression algorithms,and then summarizes three com-monly used joint compression algorithms:"knowledge distillation + pruning","knowledge distillation + quantification"and"pruning + quantification",focusing on the analysis and discussion of basic ideas and methods of joint compression.Finally,the future key development direction of the neural network compression joint optimization method is put forward.

关键词

神经网络/压缩/剪枝/量化/知识蒸馏/模型压缩/深度学习

Key words

neural network/compression/pruning/quantization/knowledge distillation/model compression/deep learning

引用本文复制引用

基金项目

国家自然科学基金(62373343)

北京市自然科学基金(L233036)

出版年

2024
智能系统学报
中国人工智能学会 哈尔滨工程大学

智能系统学报

CSTPCD北大核心
影响因子:0.672
ISSN:1673-4785
被引量1
参考文献量93
段落导航相关论文