0

Agentic Misalignment: How LLMs could be insider threats

Active

Eliciting unethical behaviour (most famously blackmail) in response to a fictional company-assistant scenario where the model is faced with replacement.

Publisher
Anthropic
Domain
Scheming
License
mit
Published
Aug 2025
Notable for
Benchmark for evaluating Scheming.

Cite

Notes

Only stored in your browser.

FAQ

What is Agentic Misalignment: How LLMs could be insider threats?
Eliciting unethical behaviour (most famously blackmail) in response to a fictional company-assistant scenario where the model is faced with replacement.
What license is Agentic Misalignment: How LLMs could be insider threats under?
Agentic Misalignment: How LLMs could be insider threats is available under mit.