Cite
Notes
Only stored in your browser.
Attribution
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
arXiv 2024
Representation Engineering: A Top-Down Approach to AI Transparency
arXiv 2023
from 2 papers
Abhay Sheshadri
Aengus Lynch
Aidan Ewart
Alex Mallen
Alexander Pan
Andy Zou
founder
Ann-Kathrin Dombrowski
Asa Cooper Stickland
researcher
Cindy Wu
Dan Hendrycks
director