0

SAD: Situational Awareness Dataset

Active

Evaluates situational awareness in LLMs-knowledge of themselves and their circumstances-through behavioral tests including recognizing generated text, predicting behavior, and following self-aware instructions. Current implementation includes SAD-mini with 5 of 16 tasks.

Domain
Scheming
License
mit
Published
Jan 2026
Notable for
Benchmark for evaluating Scheming.

Cite

Notes

Only stored in your browser.

Related tools

2
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is SAD: Situational Awareness Dataset?
Evaluates situational awareness in LLMs-knowledge of themselves and their circumstances-through behavioral tests including recognizing generated text, predicting behavior, and following self-aware instructions. Current implementation includes SAD-mini with 5 of 16 tasks.
How can a model improve its SAD: Situational Awareness Dataset score?
Tools linked to SAD: Situational Awareness Dataset on Sophon include SAD RL Env (Community), SAD RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is SAD: Situational Awareness Dataset under?
SAD: Situational Awareness Dataset is available under mit.