CoreBench Hard
Fresh
CORE-Bench evaluates the ability of agents to computationally reproduce the results of published scientific papers. In CORE-Bench Hard, the agent is only given the codebase of the paper and must install all libraries and dependencies, run the code, and read through the output…
- Type
- RL Env
- Runtime
ORS- License
- unknown
- Size
- 90 tasks
- Published
- Mar 2026
- Canonical
- openreward.ai/siegelz/CoreBench-Hard
Cite
Notes
Only stored in your browser.