Cite
Notes
Only stored in your browser.
Attribution
On Evaluating the Durability of Safeguards for Open-Weight LLMs
arXiv 2024
Universal and Transferable Adversarial Attacks on Aligned Language Models
arXiv 2023
from 2 papers
Nicholas Carlini
Andy Zou
founder
Boyi Wei
J. Zico Kolter
Luxi He
Matt Fredrikson
Matthew Jagielski
Peter Henderson
Prateek Mittal
Tinghao Xie