Cite
Notes
Only stored in your browser.
Attribution
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
arXiv 2024
A StrongREJECT for Empty Jailbreaks
JaxMARL: Multi-Agent RL Environments and Algorithms in JAX
arXiv 2023
from 3 papers
Akbir Khan
Alexander Rutherford
Andrei Lupu
Andy Zou
founder
Benjamin Ellis
Bruno Lacerda
Chris Lu
Christian Schroeder de Witt
Dan Hendrycks
director
Derek Duenas