Prateek Mittal

Papers: 6

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

6papers

Authored papers

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

arXiv 2024

2024

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

arXiv 2024

2024

On Evaluating the Durability of Safeguards for Open-Weight LLMs

arXiv 2024

2024

Visual Adversarial Examples Jailbreak Aligned Large Language Models

arXiv 2023

2023

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

arXiv 2023

2023

FALCON: Honest-Majority Maliciously Secure Framework for Private Deep Learning

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

from 6 papers

Peter Henderson

Xiangyu Qi

Tinghao Xie

Ashwinee Panda

Boyi Wei

Kaixuan Huang

Luxi He

Ruoxi Jia

Yangsibo Huang

Yi Zeng