Cite
Notes
Only stored in your browser.
Attribution
Why Do We Need Weight Decay in Modern Deep Learning?
arXiv 2023
SGD with Large Step Sizes Learns Sparse Features
arXiv 2022
from 2 papers
Maksym Andriushchenko
Nicolas Flammarion
Francesco D'Angelo
Loucas Pillaud-Vivien