Phillip Guo

Cite

Notes

Only stored in your browser.

Attribution

2papers

Authored papers

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

arXiv 2024

Representation Engineering: A Top-Down Approach to AI Transparency

arXiv 2023

No known affiliations.

from 2 papers

Abhay Sheshadri

Aengus Lynch

Aidan Ewart

Alex Mallen

Alexander Pan

Andy Zou

founder

Ann-Kathrin Dombrowski

Asa Cooper Stickland

researcher

Cindy Wu

Dan Hendrycks

director