基于扩散模型的解耦知识蒸馏

扫码查看

原文链接

万方数据
维普

中文摘要：知识蒸馏(KD)是一种将复杂模型(教师模型)的知识传递给简单模型(学生模型)的技术,目前比较受欢迎的蒸馏方法大多停留在基于中间特征层,继解耦知识蒸馏(DKD)提出后基于响应的知识蒸馏又重新回到SOTA行列,这种使用强一致性约束条件的策略,将经典的知识蒸馏拆分为两个部分,解决了高度耦合的问题.然而,这种方法忽略了师生网络架构差距较大所引起的表征差距过大,进而导致学生模型由于体量较小无法更有效的学习到教师模型的知识的问题.为了解决这个问题,本文提出了使用扩散模型来缩小师生模型之间的表征差距,这种方法将教师特征传输到扩散模型中训练,然后通过一个轻量级的扩散模型对学生模型进行降噪从而缩小了师生模型的表征差距.大量的实验表明这种方法对比于基准方法在CIFAR-100、ImageNet数据集上均有较大的提升,在师生网络架构差距较大时依然能够保持较好的性能.

外文标题：Decoupled Knowledge Distillation Based on Diffusion Model

外文摘要：Knowledge distillation(KD)is a technique that transfers knowledge from a complex model(teacher model)to a simpler model(student model).While many popular distillation methods currently focus on intermediate feature layers,response-based knowledge distillation(RKD)has regained its position among the SOTA models after decoupled knowledge distillation(DKD)was introduced.RKD leverages strong consistency constraints to split classic knowledge distillation into two parts,addressing the issue of high coupling.However,this approach overlooks the significant representation gap caused by the disparity in teacher-student network architectures,leading to the problem where smaller student models cannot effectively learn knowledge from teacher models.To solve this problem,this study proposes a diffusion model to narrow the representation gap between teacher and student models.This model transfers teacher features to train a lightweight diffusion model,which is then used to denoise the student model,thus reducing the representation gap between teacher and student models.Extensive experiments demonstrate that the proposed model achieves significant improvements over baseline models on CIFAR-100 and ImageNet datasets,maintaining good performance even when there is a large gap in teacher-student network architectures.

外文关键词：

knowledge distillation(KD)decoupled knowledge distillationdiffusion modelrepresentation gapteacher-student network

作者：

王鹏宇、朱子奇

展开 >

作者单位：

武汉科技大学计算机科学与技术学院,武汉 430065

关键词：

知识蒸馏解耦知识蒸馏扩散模型表征差距师生网络

基金：

公安部科技计划

项目编号：

2022JSM08

出版年：

2024

DOI：

10.15888/j.cnki.csa.009615

计算机系统应用

中国科学院软件研究所

计算机系统应用

CSTPCD

影响因子：0.449

ISSN：1003-3254

年,卷(期)：2024.33(9)