DMAdam: Dual averaging enhanced adaptive gradient method for deep neural networks

扫码查看

原文链接

NETL
NSTL
Elsevier

外文摘要：© 2024 Elsevier B.V.Deep neural networks (DNNs) have achieved remarkable success in a wide range of fields, largely due to their stable and efficient optimizers. We propose a novel optimizer called Dual Momentum Adam (DMAdam), which combines the stability of dual averaging with the efficiency of adaptive gradient techniques. DMAdam adaptively tunes the learning rate and employs dual averaging updates, effectively balancing stability and convergence rate. This strategy enhances the control of DMAdam over gradient updates, resulting in superior performance in a variety of optimization tasks. Theoretically, we investigate the convergence properties of DMAdam for non-convex models and obtain the non-ergodic convergence of its gradient sequence. Numerically, we demonstrate the impressive performance of DMAdam on CIFAR-10 and CIFAR-100 datasets for image classification tasks. Additionally, DMAdam shows robust performance in natural language processing and object detection tasks. The PyTorch code of DMAdam is available at: https://github.com/Wenhan-Jiang/DMAdam.git.

外文关键词：

Adaptive gradient methodConvergenceDeep neural networksDMAdamDual-averaging method

作者：

Jiang W.、Xu D.、Liu J.、Zhang N.

展开 >

作者单位：

Key Laboratory for Applied Statistics of MOE School of Mathematics and Statistics Northeast Normal University

Department of Mathematics Changchun Normal University

Mathematics and Information Science Wenzhou University

出版年：

2025

DOI：

10.1016/j.knosys.2024.112886

Knowledge-based systems

SCI

ISSN：0950-7051

年,卷(期)：2025.309(Jan.30)

参考文献量41