Cite
Notes
Only stored in your browser.
Attribution
Training Language Models to Explain Their Own Computations
arXiv 2025
Opening the AI black box: program synthesis via mechanistic interpretability
arXiv 2024
Algorithmic progress in language models
Universal Neurons in GPT2 Language Models
from 4 papers
Tara Rezaei Kheirkhah
Anish Mudide
Anson Ho
researcher
Belinda Z. Li
Chloe Loughridge
David Atkinson
David Owen
Dimitris Bertsimas
Ege Erdil
Eric J. Michaud