Cite
Notes
Only stored in your browser.
Attribution
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
arXiv 2025
Applying sparse autoencoders to unlearn knowledge in language models
arXiv 2024
from 2 papers
Arthur Conmy
Yeu-Tong Lau
Adam Karvonen
Callum McDougall
Can Rager
Curt Tigges
David Chanin
Johnny Lin
Joseph Bloom
Kola Ayonrinde