Question 1

What is PaperBench: Evaluating AI''s Ability to Replicate AI Research (Work In Progress)?

Accepted Answer

Agents are evaluated on their ability to replicate 20 ICML 2024 Spotlight and Oral papers from scratch. Given a research paper PDF, an addendum with clarifications, and a rubric defining evaluation criteria, the agent must

Question 2

How can a model improve its PaperBench: Evaluating AI''s Ability to Replicate AI Research (Work In Progress) score?

Accepted Answer

Tools linked to PaperBench: Evaluating AI''s Ability to Replicate AI Research (Work In Progress) on Sophon include Paperbench ENV RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.

Question 3

What license is PaperBench: Evaluating AI''s Ability to Replicate AI Research (Work In Progress) under?

Accepted Answer

PaperBench: Evaluating AI''s Ability to Replicate AI Research (Work In Progress) is available under mit.

PaperBench: Evaluating AI''s Ability to Replicate AI Research (Work In Progress)

Related tools

Paperbench ENV RL Env (Community)

FAQ