AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

Active

Evaluating abstention across 20 diverse datasets, including questions with unknown answers, underspecification, false premises, subjective interpretations, and outdated information.

Publisher: Meta FAIR (Fundamental AI Research)
Domain: Safeguards
License: mit
Published: Sep 2025
Notable for: Benchmark for evaluating Safeguards.
Canonical: github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/abstention_bench

Cite

Notes

Only stored in your browser.

Attribution

README: github.com/UKGovernmentBEIS/inspect_evals/blob/main/src/inspect_evals/abstention_bench/README.mdMIT

Attribution policy →

FAQ

What is AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions?: Evaluating abstention across 20 diverse datasets, including questions with unknown answers, underspecification, false premises, subjective interpretations, and outdated information.
What license is AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions under?: AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions is available under mit.