Cite
Notes
Only stored in your browser.
Attribution
ZClip: Adaptive Spike Mitigation for LLM Pre-Training
arXiv 2025
Variance Control via Weight Rescaling in LLM Pre-training
A Refined Analysis of Massive Activations in LLMs
from 3 papers
Abhay Kumar
Fabian Güra
Nilabhra Roy Chowdhury