Andy Zou
CMU PhD; co-founder of Gray Swan AI; co-author of GCG jailbreak, representation engineering, HarmBench.
- Role
- founder
- Currently at
- Independent
- twitter.com/andyzou_jiaming
- GitHub
- github.com/andyzoujm
- Scholar
- scholar.google.com/citations
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition
arXiv 2026
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
arXiv 2025
TextQuests: How Good are LLMs at Text-Based Video Games?
arXiv 2025
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
arXiv 2024
Improving Alignment and Robustness with Circuit Breakers
arXiv 2024
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
arXiv 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
arXiv 2024
Universal and Transferable Adversarial Attacks on Aligned Language Models
arXiv 2023
Representation Engineering: A Top-Down Approach to AI Transparency
arXiv 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
arXiv 2023
Unlocking Deterministic Robustness Certification on ImageNet
unlocking-deterministic-robustness
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Forecasting Future World Events with Neural Networks
arXiv 2022
Measuring Massive Multitask Language Understanding
ICLR
Affiliations
Frequent co-authors
10from 15 papers
Dan Hendrycks
director
Mantas Mazeika
researcher
Matt Fredrikson
Dawn Song
professor
Long Phan
researcher
Zifan Wang
Maxwell Lin
Steven Basart
researcher
Bo Li
J. Zico Kolter