Cite
Notes
Only stored in your browser.
Attribution
The AdEMAMix Optimizer: Better, Faster, Older
arXiv 2024
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Sinkformers: Transformers with Doubly Stochastic Attention
arXiv 2021
from 3 papers
Amitis Shidani
Dan Busbridge
David Grangier
Eeshan Dhekane
Federico Danieli
Floris Weers
Gabriel Peyré
Jagrit Digani
Jason Ramapuram
Mathieu Blondel