0

LAB Bench

The Language Agent Biology Benchmark, or LAB-Bench, is an evaluation dataset for AI systems intended to benchmark capabilities foundational to scientific research in biology. This is an implementation of a benchmark made by FutureHouse.

Domain
rl-env
License
unknown
Published
Jan 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
OpenReward
Attribution policy →

Top score 58 by Claude Opus 4.6 - 1 model reporting (1 frontier)

Top models

1
LAB BenchBar chart with 1 bar. Highest value: Claude Opus 4.6 at 58.
1 model

FAQ

What is LAB Bench?
The Language Agent Biology Benchmark, or LAB-Bench, is an evaluation dataset for AI systems intended to benchmark capabilities foundational to scientific research in biology. This is an implementation of a benchmark made by FutureHouse.
What is the current top score on LAB Bench?
The top reported score is 58 by Claude Opus 4.6, across 1 model reporting (1 from frontier labs).
What license is LAB Bench under?
LAB Bench is available under unknown.