0

MASK: Disentangling Honesty from Accuracy in AI Systems

Active

Evaluates honesty in large language models by testing whether they contradict their own beliefs when pressured to lie.

Domain
Safeguards
License
mit
Published
Jul 2025
Notable for
Benchmark for evaluating Safeguards.

Cite

Notes

Only stored in your browser.

FAQ

What is MASK: Disentangling Honesty from Accuracy in AI Systems?
Evaluates honesty in large language models by testing whether they contradict their own beliefs when pressured to lie.
What license is MASK: Disentangling Honesty from Accuracy in AI Systems under?
MASK: Disentangling Honesty from Accuracy in AI Systems is available under mit.