Cite
Notes
Only stored in your browser.
Attribution
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
arXiv 2024
from 1 papers
Abhay Sheshadri
Aengus Lynch
Aidan Ewart
Asa Cooper Stickland
researcher
Cindy Wu
Dylan Hadfield-Menell
Ethan Perez
Henry Sleight
Phillip Guo
Stephen Casper