0

AdvBench

Active

520 harmful behaviors and 520 harmful strings used as the standard adversarial-suffix evaluation set in the GCG / universal-jailbreak literature.

Domain
safety
Format
HF Dataset
Size
520 tasks
License
MIT
Published
Jul 2023
Notable for
Benchmark for evaluating safety and jailbreak resistance in the safety domain.

Cite

Notes

Only stored in your browser.

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is AdvBench?
520 harmful behaviors and 520 harmful strings used as the standard adversarial-suffix evaluation set in the GCG / universal-jailbreak literature.
What capabilities does AdvBench test?
AdvBench evaluates safety, jailbreak resistance.
How can a model improve its AdvBench score?
Tools linked to AdvBench on Sophon include PKU-SafeRLHF - RL environments, datasets, and scaffolds that target this eval.
What license is AdvBench under?
AdvBench is available under MIT.