0

XSTest: A benchmark for identifying exaggerated safety behaviours in LLM''s

Active

Dataset with 250 safe prompts across ten prompt types that well-calibrated models should not refuse, and 200 unsafe prompts as contrasts that models, for most applications, should refuse.

Domain
Knowledge
License
mit
Published
May 2026
Notable for
Benchmark for evaluating Knowledge.

Cite

Notes

Only stored in your browser.

FAQ

What is XSTest: A benchmark for identifying exaggerated safety behaviours in LLM''s?
Dataset with 250 safe prompts across ten prompt types that well-calibrated models should not refuse, and 200 unsafe prompts as contrasts that models, for most applications, should refuse.
What license is XSTest: A benchmark for identifying exaggerated safety behaviours in LLM''s under?
XSTest: A benchmark for identifying exaggerated safety behaviours in LLM''s is available under mit.