Cite
Notes
Only stored in your browser.
Attribution
u-$μ$P: The Unit-Scaled Maximal Update Parametrization
arXiv 2024
SparQ Attention: Bandwidth-Efficient LLM Inference
arXiv 2023
Unit Scaling: Out-of-the-Box Low-Precision Training
from 3 papers
Carlo Luschi
Douglas Orr
Andres Felipe Cruz-Salinas
Björn Deiseroth
Constantin Eichenberg
Ivan Chelombiev
Josef Dean
Luka Ribar
Lukas Balles
Luke Hudlass-Galley