BBH: Challenging BIG-Bench Tasks
Active
Tests AI models on a suite of 23 challenging BIG-Bench tasks that previously proved difficult even for advanced language models to solve.
- Publisher
- Google (Alphabet Inc.)
- Domain
- Reasoning
- License
- mit
- Published
- May 2026
- Notable for
- Benchmark for evaluating Reasoning.
Cite
Notes
Only stored in your browser.
Related tools
2Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
1FAQ
- What is BBH: Challenging BIG-Bench Tasks?
- Tests AI models on a suite of 23 challenging BIG-Bench tasks that previously proved difficult even for advanced language models to solve.
- How can a model improve its BBH: Challenging BIG-Bench Tasks score?
- Tools linked to BBH: Challenging BIG-Bench Tasks on Sophon include BBH RL Env (Community), Bigbench BBH RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is BBH: Challenging BIG-Bench Tasks under?
- BBH: Challenging BIG-Bench Tasks is available under mit.