Cite
Notes
Only stored in your browser.
Attribution
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
arXiv 2025
Linear Representations of Sentiment in Large Language Models
arXiv 2023
from 2 papers
Neel Nanda
researcher
Adam Karvonen
Arthur Conmy
Atticus Geiger
Callum McDougall
Can Rager
David Chanin
Eoin Farrell
Johnny Lin
Joseph Bloom