Adam Karvonen

Papers: 5

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

5papers

Authored papers

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

arXiv 2025

2025

Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers

arXiv 2025

2025

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

arXiv 2025

2025

Learning Multi-Level Features with Matryoshka Sparse Autoencoders

arXiv 2025

2025

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

arXiv 2024

2024

Affiliations

No known affiliations.

Frequent co-authors

from 5 papers

Samuel Marks

4 shared papers

Neel Nanda

researcher

Can Rager

Arnab Sen Sharma

Arthur Conmy

Bart Bussmann

Benjamin Wright

Caden Juang

Callum McDougall

Claudio Mayrink Verdun

1 shared paper