Cite
Notes
Only stored in your browser.
Attribution
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
arXiv 2024
Jailbreaking Black Box Large Language Models in Twenty Queries
arXiv 2023
Black Box Adversarial Prompting for Foundation Models
from 3 papers
Eric Wong
Alexander Robey
Edgar Dobriban
George J. Pappas
Hamed Hassani
Edoardo Debenedetti
Florian Tramer
Francesco Croce
Jacob Gardner
Maksym Andriushchenko