Cite
Notes
Only stored in your browser.
Attribution
The pitfalls of next-token prediction
arXiv 2024
A Language Model's Guide Through Latent Space
Scaling MLPs: A Tale of Inductive Bias
scaling-mlps-a-tale-of-inductive-bias
Random Teachers are Good Teachers
arXiv 2023
from 4 papers
Sotiris Anagnostidis
Thomas Hofmann
Dimitri von Rütte
Felix Sarnthein
Vaishnavh Nagarajan