首页|具备合适教师的多出口自蒸馏

具备合适教师的多出口自蒸馏

扫码查看
多出口架构允许早停推理以减少计算成本,这使其可以在资源受限的情况下使用.最近的研究将多出口架构与自蒸馏相结合,以在不同网络深度上同时实现高效率和卓越性能.然而,现有方法主要从深层出口或单一集成中传递知识,以指导所有出口,而没有考虑学生和教师之间不适当的学习差距可能会降低模型性能,特别是对于浅层出口而言.为解决这个问题,提出具备合适教师的多出口自蒸馏方法,为每个出口提供多样化且适当的教师知识.在我们的方法中,根据不同可训练的集成权重,从所有出口获得多个集成教师.每个出口从所有教师那里接收知识,并重点关注其所对应的主教师,以保持适当的学习差距并实现高效的知识传递.通过这种方式,我们的方法在保证学习效率的同时实现了多样化的知识蒸馏.在CIFAR-100、TinyImageNet以及3个细粒度数据集上的实验结果表明,我们的方法在各种网络架构中始终优于最先进的多出口自蒸馏方法.
Multi-exit self-distillation with appropriate teachers
Multi-exit architecture allows early-stop inference to reduce computational cost,which can be used in resource-constrained circumstances.Recent works combine the multi-exit architecture with self-distillation to simultaneously achieve high efficiency and decent performance at different network depths.However,existing methods mainly transfer knowledge from deep exits or a single ensemble to guide all exits,without considering that inappropriate learning gaps between students and teachers may degrade the model performance,especially in shallow exits.To address this issue,we propose Multi-exit self-distillation with Appropriate TEachers(MATE)to provide diverse and appropriate teacher knowledge for each exit.In MATE,multiple ensemble teachers are obtained from all exits with different trainable weights.Each exit subsequently receives knowledge from all teachers,while focusing mainly on its primary teacher to keep an appropriate gap for efficient knowledge transfer.In this way,MATE achieves diversity in knowledge distillation while ensuring learning efficiency.Experimental results on CIFAR-100,TinyImageNet,and three fine-grained datasets demonstrate that MATE consistently outperforms state-of-the-art multi-exit self-distillation methods with various network architectures.

Multi-exit architectureKnowledge distillationLearning gap

孙武杰、陈德仿、王灿、叶德仕、冯雁、陈纯

展开 >

浙江大学计算机科学与技术学院,中国 杭州市,310000

多出口架构 知识蒸馏 学习差距

国家自然科学基金Starry Night Science Fund of Zhejiang University Shanghai Institute for Advanced Study,China

U1866602SN-ZJU-SIAS-001

2024

信息与电子工程前沿(英文)
浙江大学

信息与电子工程前沿(英文)

CSTPCD
影响因子:0.371
ISSN:2095-9184
年,卷(期):2024.25(4)
  • 43