Cite
Notes
Only stored in your browser.
Attribution
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis
arXiv 2024
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
arXiv 2023
from 2 papers
Bin Zhu
Emre Kiciman
Fangzhao Wu
Guangzhong Sun
Jingwei Yi
Minghong Fang
Neil Gong
Renjie Pi
Xing Xie