Nora Belrose
- Papers
- 9
Cite
Notes
Only stored in your browser.
9papers
Authored papers
9Partially Rewriting a Transformer in Natural Language
arXiv 2025
Refusal in LLMs is an Affine Function
arXiv 2024
Does Transformer Interpretability Transfer to RNNs?
arXiv 2024
Understanding Gradient Descent through the Training Jacobian
arXiv 2024
Balancing Label Quantity and Quality for Scalable Elicitation
arXiv 2024
LEACE: Perfect linear concept erasure in closed form
NeurIPS 2023 11
Eliciting Latent Predictions from Transformers with the Tuned Lens
arXiv 2023
Eliciting Latent Knowledge from Quirky Language Models
arXiv 2023
Adversarial Policies Beat Superhuman Go AIs
arXiv 2022
Affiliations
No known affiliations.
Frequent co-authors
10from 9 papers