Andy Zou

CMU PhD; co-founder of Gray Swan AI; co-author of GCG jailbreak, representation engineering, HarmBench.

Role: founder
Currently at: Independent
Twitter: twitter.com/andyzou_jiaming
GitHub: github.com/andyzoujm
Scholar: scholar.google.com/citations
Papers: 15

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

15papers

Authored papers

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

arXiv 2026

2026

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

arXiv 2025

2025

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

arXiv 2025

2025

TextQuests: How Good are LLMs at Text-Based Video Games?

arXiv 2025

2025

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

arXiv 2024

2024

Improving Alignment and Robustness with Circuit Breakers

arXiv 2024

2024

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

arXiv 2024

2024

Tamper-Resistant Safeguards for Open-Weight LLMs

arXiv 2024

2024

Universal and Transferable Adversarial Attacks on Aligned Language Models

arXiv 2023

2023

Representation Engineering: A Top-Down Approach to AI Transparency

arXiv 2023

2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

arXiv 2023

2023

Unlocking Deterministic Robustness Certification on ImageNet

unlocking-deterministic-robustness

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Forecasting Future World Events with Neural Networks

arXiv 2022

2022

Measuring Massive Multitask Language Understanding

ICLR

2020

Affiliations

Currently at

Independent

founder · community

Previously

Center for AI Safetynon profit Carnegie Mellon Universityuniversity lab

Frequent co-authors

from 15 papers

Dan Hendrycks

director

10 shared papers

Mantas Mazeika

researcher

7 shared papers

Matt Fredrikson

7 shared papers

Dawn Song

professor

5 shared papers

Long Phan

researcher

5 shared papers

Zifan Wang

5 shared papers

Maxwell Lin

4 shared papers

Steven Basart

researcher

4 shared papers

Bo Li

3 shared papers

J. Zico Kolter

3 shared papers