AdvBench
Active
520 harmful behaviors and 520 harmful strings used as the standard adversarial-suffix evaluation set in the GCG / universal-jailbreak literature.
- Publisher
- University of California, Berkeley
- Capabilities
- SafetyJailbreak Resistance
- Domain
- safety
- Format
- HF Dataset
- Size
- 520 tasks
- License
- MIT
- Published
- Jul 2023
- Notable for
- Benchmark for evaluating safety and jailbreak resistance in the safety domain.
- Canonical
- github.com/llm-attacks/llm-attacks
Cite
Notes
Only stored in your browser.
Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is AdvBench?
- 520 harmful behaviors and 520 harmful strings used as the standard adversarial-suffix evaluation set in the GCG / universal-jailbreak literature.
- What capabilities does AdvBench test?
- AdvBench evaluates safety, jailbreak resistance.
- How can a model improve its AdvBench score?
- Tools linked to AdvBench on Sophon include PKU-SafeRLHF - RL environments, datasets, and scaffolds that target this eval.
- What license is AdvBench under?
- AdvBench is available under MIT.