Prateek Mittal
- Papers
- 6
Cite
Notes
Only stored in your browser.
6papers
Authored papers
6SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
arXiv 2024
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
arXiv 2024
On Evaluating the Durability of Safeguards for Open-Weight LLMs
arXiv 2024
Visual Adversarial Examples Jailbreak Aligned Large Language Models
arXiv 2023
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
arXiv 2023
FALCON: Honest-Majority Maliciously Secure Framework for Private Deep Learning
arXiv 2020
Affiliations
No known affiliations.
Frequent co-authors
10from 6 papers