Cite
Notes
Only stored in your browser.
Attribution
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
arXiv 2024
LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked
arXiv 2023
from 2 papers
Duen Horng Chau
Shengyun Peng
Alec Helbling
Cory Cornelius
Mansi Phute
Pin-Yu Chen
Sebastian Szyller