Cite
Notes
Only stored in your browser.
Attribution
Refusal in Language Models Is Mediated by a Single Direction
arXiv 2024
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
from 2 papers
Neel Nanda
researcher
Aaquib Syed
Andy Arditi
Daniel Paleka
Javier Ferrando
Nina Panickssery
Senthooran Rajamanoharan
Wes Gurnee