Cite
Notes
Only stored in your browser.
Attribution
Inheritune: Training Smaller Yet More Attentive Language Models
arXiv 2024
Early Weight Averaging meets High Learning Rates for LLM Pre-training
arXiv 2023
from 2 papers
Sujay Sanghavi
Abhishek Kumar
Alexandros G. Dimakis
Atula Neerkaje
Jean Kaddour
Ravid Shwartz-Ziv