What capabilities does BIG-Bench test?

BIG-Bench evaluates factual recall, planning, math, instruction following.

How can a model improve its BIG-Bench score?

Tools linked to BIG-Bench on Sophon include Bigbench BBH RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.

BIG-Bench is available under Apache-2.0.

204 diverse tasks contributed by 450 researchers at 132 institutions - the original "test everything" LLM benchmark.

Cite

Notes

Only stored in your browser.

Implementations, trainers, datasets and scaffolds linked to this eval.

Prime Community

Big Bench + BBH implementation

TMLR · 2022

Introduces BIG-bench, a 200+ task collaborative benchmark spanning logic, social bias, code, and creative reasoning, contributed by 450+ authors.

TMLR · 2022

Introduces BIG-bench, a 200+ task collaborative benchmark spanning logic, social bias, code, and creative reasoning, contributed by 450+ authors.

What is BIG-Bench?: 204 diverse tasks contributed by 450 researchers at 132 institutions - the original "test everything" LLM benchmark.
What capabilities does BIG-Bench test?: BIG-Bench evaluates factual recall, planning, math, instruction following.
How can a model improve its BIG-Bench score?: Tools linked to BIG-Bench on Sophon include Bigbench BBH RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
What license is BIG-Bench under?: BIG-Bench is available under Apache-2.0.