0

LAB Bench

Fresh

The Language Agent Biology Benchmark, or LAB-Bench, is an evaluation dataset for AI systems intended to benchmark capabilities foundational to scientific research in biology. This is an implementation of a benchmark made by FutureHouse.

Type
RL Env
Runtime
ORS
License
unknown
Size
1967 tasks
Published
Jan 2026

Cite

Notes

Only stored in your browser.

Public scores on this env

1

2 vf-eval reports across 1 model

Open the scoring view →