Cite
Notes
Only stored in your browser.
Attribution
Eliciting Secret Knowledge from Language Models
arXiv 2025
Towards eliciting latent knowledge from LLMs with mechanistic interpretability
from 2 papers
Bartosz Cywiński
Neel Nanda
researcher
Senthooran Rajamanoharan
Arthur Conmy
Rowan Wang
Samuel Marks