Cite
Notes
Only stored in your browser.
Attribution
When can transformers reason with abstract symbols?
arXiv 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Vanishing Gradients in Reinforcement Finetuning of Language Models
from 3 papers
Joshua Susskind
Omid Saremi
Arwen Bradley
Dan Busbridge
Emmanuel Abbe
Enric Boix-Adsera
Hattie Zhou
Jason Ramapuram
Jiatao Gu
Josh Susskind