Neural Networks2022,Vol.14811.DOI:10.1016/j.neunet.2022.01.012

Low-degree term first in ResNet, its variants and the whole neural network family

Sun, Tongfeng Ding, Shifei Guo, Lili
Neural Networks2022,Vol.14811.DOI:10.1016/j.neunet.2022.01.012

Low-degree term first in ResNet, its variants and the whole neural network family

Sun, Tongfeng 1Ding, Shifei Guo, Lili
扫码查看

作者信息

  • 1. Sch Comp Sci & Technol,China Univ Min & Technol
  • 折叠

Abstract

To explain the working mechanism of ResNet and its variants, this paper proposes a novel argument of shallow subnetwork first (SSF), essentially low-degree term first (LDTF), which also applies to the whole neural network family. A neural network with shortcut connections behaves as an ensemble of a number of subnetworks of differing depths. Among the subnetworks, the shallow subnetworks are trained firstly, having great effects on the performance of the neural network. The shallow subnetworks roughly correspond to low-degree polynomials, while the deep subnetworks are opposite. Based on Taylor expansion, SSF is consistent with LDTF. ResNet is in line with Taylor expansion: shallow subnetworks are trained firstly to keep low-degree terms, avoiding overfitting; deep subnetworks try to maintain high-degree terms, ensuring high description capacity. Experiments on ResNets and DenseNets show that shallow subnetworks are trained firstly and play important roles in the training of the networks. The experiments also reveal the reason why DenseNets outperform ResNets: The subnetworks playing vital roles in the training of the former are shallower than those in the training of the latter. Furthermore, LDTF can also be used to explain the working mechanism of other ResNet variants (SE-ResNets and SK-ResNets), and the common phenomena occurring in many neural networks. (C)& nbsp;& nbsp;2022 Elsevier Ltd. All rights reserved.

Key words

ResNets/DenseNets/Shallow subnetwork first/Low-degree term first/Taylor expansion

引用本文复制引用

出版年

2022
Neural Networks

Neural Networks

EISCI
ISSN:0893-6080
被引量12
参考文献量34
段落导航相关论文