0

BIG-Bench

204 diverse tasks contributed by 450 researchers at 132 institutions - the original "test everything" LLM benchmark.

Format
HF Dataset
Size
204 tasks
License
Apache-2.0
Published
Jun 2022
Notable for
Benchmark for evaluating factual recall, planning and math.

Cite

Notes

Only stored in your browser.

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

FAQ

What is BIG-Bench?
204 diverse tasks contributed by 450 researchers at 132 institutions - the original "test everything" LLM benchmark.
What capabilities does BIG-Bench test?
BIG-Bench evaluates factual recall, planning, math, instruction following.
How can a model improve its BIG-Bench score?
Tools linked to BIG-Bench on Sophon include Bigbench BBH RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
What license is BIG-Bench under?
BIG-Bench is available under Apache-2.0.