Cite
Notes
Only stored in your browser.
Attribution
Adam-mini: Use Fewer Learning Rates To Gain More
arXiv 2024
Why Transformers Need Adam: A Hessian Perspective
from 2 papers
Ruoyu Sun
Tian Ding
Yushun Zhang
Zhi-Quan Luo
Ziniu Li
Chenwei Wu
Diederik P. Kingma
Yinyu Ye