Arthur Conmy
- Papers
- 8
Cite
Notes
Only stored in your browser.
Authored papers
8Thought Anchors: Which LLM Reasoning Steps Matter?
arXiv 2025
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
arXiv 2025
Eliciting Secret Knowledge from Language Models
arXiv 2025
Improving Steering Vectors by Targeting Sparse Autoencoder Features
arXiv 2024
Interpreting Attention Layer Outputs with Sparse Autoencoders
arXiv 2024
Applying sparse autoencoders to unlearn knowledge in language models
arXiv 2024
Towards Automated Circuit Discovery for Mechanistic Interpretability
towards-automated-circuit-discovery-for
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
arXiv 2022
Affiliations
Frequent co-authors
10from 8 papers
Neel Nanda
researcher
Eoin Farrell
Samuel Marks
Yeu-Tong Lau
Adam Karvonen
Adrià Garriga-Alonso
Aengus Lynch
Alexandre Variengien
Augustine N. Mavor-Parker
Bartosz Cywiński