NoveltyBench: Evaluating Language Models for Humanlike Diversity

Active

Evaluates how well language models generate diverse, humanlike responses across multiple reasoning and generation tasks. This evaluation assesses whether LLMs can produce varied outputs rather than repetitive or uniform answers.

Open

Publisher: Carnegie Mellon University
Domain: Reasoning
License: mit
Published: Dec 2025
Notable for: Benchmark for evaluating Reasoning.
Canonical: github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/novelty_bench

Cite

Notes

Only stored in your browser.

Attribution

README: github.com/UKGovernmentBEIS/inspect_evals/blob/main/src/inspect_evals/novelty_bench/README.mdMIT

Attribution policy →

FAQ

What is NoveltyBench: Evaluating Language Models for Humanlike Diversity?: Evaluates how well language models generate diverse, humanlike responses across multiple reasoning and generation tasks. This evaluation assesses whether LLMs can produce varied outputs rather than repetitive or uniform answers.
What license is NoveltyBench: Evaluating Language Models for Humanlike Diversity under?: NoveltyBench: Evaluating Language Models for Humanlike Diversity is available under mit.