How can a model improve its BBH: Challenging BIG-Bench Tasks score?

Tools linked to BBH: Challenging BIG-Bench Tasks on Sophon include BBH RL Env (Community), Bigbench BBH RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.

What license is BBH: Challenging BIG-Bench Tasks under?

BBH: Challenging BIG-Bench Tasks is available under mit.

BBH: Challenging BIG-Bench Tasks

Active

Tests AI models on a suite of 23 challenging BIG-Bench tasks that previously proved difficult even for advanced language models to solve.

Open

Publisher: Google (Alphabet Inc.)
Domain: Reasoning
License: mit
Published: May 2026
Notable for: Benchmark for evaluating Reasoning.
Canonical: github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/bbh

Cite

Notes

Only stored in your browser.

Attribution

README: github.com/UKGovernmentBEIS/inspect_evals/blob/main/src/inspect_evals/bbh/README.mdMIT

Attribution policy →

Related tools

View all

Implementations, trainers, datasets and scaffolds linked to this eval.

BBH RL Env (Community)

BigBenchHard (BBH) evaluation environment with Chain-of-Thought

ImplementationRL EnvBbhReasoning

Bigbench BBH RL Env (Prime Community)

Prime Community

Big Bench + BBH implementation

ImplementationRL EnvBigbenchBbhNLP

Papers

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

ACL · 2022

Google paper isolating the 23 hardest BIG-Bench tasks (BBH) where prior models lagged humans, showing chain-of-thought prompting closes most of the gap.

FAQ

What is BBH: Challenging BIG-Bench Tasks?: Tests AI models on a suite of 23 challenging BIG-Bench tasks that previously proved difficult even for advanced language models to solve.
How can a model improve its BBH: Challenging BIG-Bench Tasks score?: Tools linked to BBH: Challenging BIG-Bench Tasks on Sophon include BBH RL Env (Community), Bigbench BBH RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
What license is BBH: Challenging BIG-Bench Tasks under?: BBH: Challenging BIG-Bench Tasks is available under mit.