Cite
Notes
Only stored in your browser.
Attribution
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
arXiv 2025
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
Tell me about yourself: LLMs are aware of their learned behaviors
from 3 papers
Jan Betley
Owain Evans
founder
James Chua
Martín Soto
Xuchan Bao
Andy Arditi
Daniel Tan
Dylan Feng
Jorio Cocola
Nathan Labenz