0

CoreBench Hard

Fresh

CORE-Bench evaluates the ability of agents to computationally reproduce the results of published scientific papers. In CORE-Bench Hard, the agent is only given the codebase of the paper and must install all libraries and dependencies, run the code, and read through the output…

Type
RL Env
Runtime
ORS
License
unknown
Size
90 tasks
Published
Mar 2026

Cite

Notes

Only stored in your browser.

Contributors

1