0

BBH: Challenging BIG-Bench Tasks

Active

Tests AI models on a suite of 23 challenging BIG-Bench tasks that previously proved difficult even for advanced language models to solve.

Domain
Reasoning
License
mit
Published
May 2026
Notable for
Benchmark for evaluating Reasoning.

Cite

Notes

Only stored in your browser.

Related tools

2
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

1

FAQ

What is BBH: Challenging BIG-Bench Tasks?
Tests AI models on a suite of 23 challenging BIG-Bench tasks that previously proved difficult even for advanced language models to solve.
How can a model improve its BBH: Challenging BIG-Bench Tasks score?
Tools linked to BBH: Challenging BIG-Bench Tasks on Sophon include BBH RL Env (Community), Bigbench BBH RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
What license is BBH: Challenging BIG-Bench Tasks under?
BBH: Challenging BIG-Bench Tasks is available under mit.