0

Learning-Rate-Free Learning by D-Adaptation

D-Adaptation is an automatic learning rate setting method for convex Lipschitz functions that achieves optimal convergence without back-tracking or additional evaluations, outperforming hand-tuned rates in various machine learning problems.

Year
2023
Venue
arXiv 2023
Authors
2
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2301.07733v5ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step. Our approach is the first hyper-parameter free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. An open-source implementation is available.

Authors

2