Agentic Misalignment: How LLMs could be insider threats
Active
Eliciting unethical behaviour (most famously blackmail) in response to a fictional company-assistant scenario where the model is faced with replacement.
- Publisher
- Anthropic
- Domain
- Scheming
- License
- mit
- Published
- Aug 2025
- Notable for
- Benchmark for evaluating Scheming.
Cite
Notes
Only stored in your browser.
FAQ
- What is Agentic Misalignment: How LLMs could be insider threats?
- Eliciting unethical behaviour (most famously blackmail) in response to a fictional company-assistant scenario where the model is faced with replacement.
- What license is Agentic Misalignment: How LLMs could be insider threats under?
- Agentic Misalignment: How LLMs could be insider threats is available under mit.