Neural Networks2022,Vol.14911.DOI:10.1016/j.neunet.2022.01.019

Anomalous diffusion dynamics of learning in deep neural networks

Chen, Guozhang Qu, Cheng Kevin Gong, Pulin
Neural Networks2022,Vol.14911.DOI:10.1016/j.neunet.2022.01.019

Anomalous diffusion dynamics of learning in deep neural networks

Chen, Guozhang 1Qu, Cheng Kevin 1Gong, Pulin1
扫码查看

作者信息

  • 1. Sch Phys,Univ Sydney
  • 折叠

Abstract

Learning in deep neural networks (DNNs) is implemented through minimizing a highly non-convex loss function, typically by a stochastic gradient descent (SGD) method. This learning process can effectively find generalizable solutions at flat minima. In this study, we present a novel account of how such effective deep learning emerges through the interactions of the SGD and the geometrical structure of the loss landscape. We find that the SGD exhibits rich, complex dynamics when navigating through the loss landscape; initially, the SGD exhibits superdiffusion, which attenuates gradually and changes to subdiffusion at long times when approaching a solution. Such learning dynamics happen ubiquitously in different DNN types such as ResNet, VGG-like networks and Vision Transformers; similar results emerge for various batch size and learning rate settings. The superdiffusion process during the initial learning phase indicates that the motion of SGD along the loss landscape possesses intermittent, big jumps; this non-equilibrium property enables the SGD to effectively explore the loss landscape. By adapting methods developed for studying energy landscapes in complex physical systems, we find that such superdiffusive learning processes are due to the interactions of the SGD and the fractallike regions of the loss landscape. We further develop a phenomenological model to demonstrate the mechanistic role of the fractal-like loss landscape in enabling the SGD to effectively find flat minima. Our results reveal the effectiveness of SGD in deep learning from a novel perspective and have implications for designing efficient deep neural networks.(C) 2022 Elsevier Ltd. All rights reserved.

Key words

Deep neural networks/Stochastic gradient descent/Complex systems/ENERGY LANDSCAPE

引用本文复制引用

出版年

2022
Neural Networks

Neural Networks

EISCI
ISSN:0893-6080
被引量4
参考文献量55
段落导航相关论文