Cite
Notes
Only stored in your browser.
Attribution
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
arXiv 2024
Alignment faking in large language models
from 2 papers
Akbir Khan
Benjamin Wright
Buck Shlegeris
Carson Denison
Cem Anil
Dami Choi
David Duvenaud
Ethan Perez
Evan Hubinger
Fabien Roger