CoreBench Hard

Fresh

CORE-Bench evaluates the ability of agents to computationally reproduce the results of published scientific papers. In CORE-Bench Hard, the agent is only given the codebase of the paper and must install all libraries and dependencies, run the code, and read through the output…

Type: RL Env
Tags: AI Experiment Reproduction AI Research Replication
Runtime: ORS
License: unknown
Size: 90 tasks
Published: Mar 2026
Canonical: openreward.ai/siegelz/CoreBench-Hard

Cite

Notes

Only stored in your browser.

Attribution

README: openreward.ai/siegelz/CoreBench-Hard

Attribution policy →

Contributors

Zachary Siegel