Cite
Notes
Only stored in your browser.
Attribution
Best-of-N Jailbreaking
arXiv 2024
Debating with More Persuasive LLMs Leads to More Truthful Answers
Looking Inward: Language Models Can Learn About Themselves by Introspection
from 3 papers
Ethan Perez
Henry Sleight
Aengus Lynch
Akbir Khan
Ansh Radhakrishnan
Dan Valentine
Edward Grefenstette
Erik Jones
Fazl Barez
Felix J Binder