Cite
Notes
Only stored in your browser.
Attribution
Alignment faking in large language models
arXiv 2024
from 1 papers
Akbir Khan
Benjamin Wright
Buck Shlegeris
Carson Denison
David Duvenaud
Ethan Perez
Evan Hubinger
Fabien Roger
Jack Chen
Jared Kaplan
co-founder / Chief Science Officer