Cite
Notes
Only stored in your browser.
Attribution
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis
arXiv 2024
from 1 papers
Neil Gong
Renjie Pi
Yueqi Xie