Cite
Notes
Only stored in your browser.
Attribution
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs
arXiv 2025
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
arXiv 2024
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
from 3 papers
Roger Grosse
Adam Jermyn
Alex Cloud
Amanda Askell
researcher
Ansh Radhakrishnan
Aryo Pradipta Gema
Buck Shlegeris
Carson Denison
Dami Choi
Daniel M. Ziegler