Arthur Conmy

Papers: 8

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

8papers

Authored papers

Thought Anchors: Which LLM Reasoning Steps Matter?

arXiv 2025

2025

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

arXiv 2025

2025

Eliciting Secret Knowledge from Language Models

arXiv 2025

2025

Interpreting Attention Layer Outputs with Sparse Autoencoders

arXiv 2024

2024

Applying sparse autoencoders to unlearn knowledge in language models

arXiv 2024

2024

Improving Steering Vectors by Targeting Sparse Autoencoder Features

arXiv 2024

2024

Towards Automated Circuit Discovery for Mechanistic Interpretability

towards-automated-circuit-discovery-for

2023

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

from 8 papers

Neel Nanda

researcher

Eoin Farrell

Samuel Marks

Yeu-Tong Lau

Adam Karvonen

Adrià Garriga-Alonso

Aengus Lynch

Alexandre Variengien

Augustine N. Mavor-Parker

1 shared paper

Bartosz Cywiński

1 shared paper