BIG-Bench
204 diverse tasks contributed by 450 researchers at 132 institutions - the original "test everything" LLM benchmark.
- Publisher
- Google DeepMind
- Capabilities
- Factual RecallPlanningMathInstruction Following
- Format
- HF Dataset
- Size
- 204 tasks
- License
- Apache-2.0
- Published
- Jun 2022
- Notable for
- Benchmark for evaluating factual recall, planning and math.
- Canonical
- github.com/google/BIG-bench
Cite
Notes
Only stored in your browser.
Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2FAQ
- What is BIG-Bench?
- 204 diverse tasks contributed by 450 researchers at 132 institutions - the original "test everything" LLM benchmark.
- What capabilities does BIG-Bench test?
- BIG-Bench evaluates factual recall, planning, math, instruction following.
- How can a model improve its BIG-Bench score?
- Tools linked to BIG-Bench on Sophon include Bigbench BBH RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is BIG-Bench under?
- BIG-Bench is available under Apache-2.0.