Cite
Notes
Only stored in your browser.
Attribution
Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models
arXiv 2024
from 1 papers
Aditya Shrivastava
Alfy Samuel
Anoop Kumar
Ashwinee Panda
Chenyang Zhu
Micah Goldblum
Neel Jain
Tom Goldstein