Cite
Notes
Only stored in your browser.
Attribution
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
arXiv 2024
from 1 papers
Amanda Askell
researcher
Ansh Radhakrishnan
Buck Shlegeris
Carson Denison
Cem Anil
Daniel M. Ziegler
David Duvenaud
Deep Ganguli
Ethan Perez
Evan Hubinger