Cite
Notes
Only stored in your browser.
Attribution
Learning diverse attacks on large language models for robust red-teaming and safety tuning
arXiv 2024
from 1 papers
Gauthier Gidel
Juho Lee
Kenji Kawaguchi
Lynn Cherif
Minsu Kim
Moksh Jain
Nikolay Malkin
Seanie Lee
Sung Ju Hwang
Yoshua Bengio