Cite
Notes
Only stored in your browser.
Attribution
Best-of-N Jailbreaking
arXiv 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
from 3 papers
Ethan Perez
Fazl Barez
Henry Sleight
Adam Jermyn
Aengus Lynch
Alwin Peng
Amanda Askell
researcher
Ansh Radhakrishnan
Buck Shlegeris
Carson Denison