MASK: Disentangling Honesty from Accuracy in AI Systems
Active
Evaluates honesty in large language models by testing whether they contradict their own beliefs when pressured to lie.
- Publisher
- Center for AI Safety (CAIS)
- Domain
- Safeguards
- License
- mit
- Published
- Jul 2025
- Notable for
- Benchmark for evaluating Safeguards.
Cite
Notes
Only stored in your browser.
FAQ
- What is MASK: Disentangling Honesty from Accuracy in AI Systems?
- Evaluates honesty in large language models by testing whether they contradict their own beliefs when pressured to lie.
- What license is MASK: Disentangling Honesty from Accuracy in AI Systems under?
- MASK: Disentangling Honesty from Accuracy in AI Systems is available under mit.