Cite
Notes
Only stored in your browser.
Attribution
Improving Alignment and Robustness with Circuit Breakers
arXiv 2024
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
from 2 papers
Andy Zou
founder
Dan Hendrycks
director
Justin Wang
Maksym Andriushchenko
Matt Fredrikson
Maxwell Lin
Zico Kolter
professor
Alexandra Souly
Eric Winsor
Jerome Wynne