What capabilities does PRM800K test?

PRM800K evaluates math, llm judging.

What license is PRM800K under?

PRM800K is available under MIT.

PRM800K

Active

800,000 step-level human labels on GPT-4 solutions to MATH problems - the canonical process-reward training/eval dataset.

Open

Publisher: OpenAI
Capabilities: Math LLM Judging
Domain: math
Format: HF Dataset
Size: 800000 tasks
License: MIT
Published: May 2023
Notable for: Benchmark for evaluating math and llm judging in the math domain.
Canonical: github.com/openai/prm800k
Also on: huggingface.co/datasets/tasksource/PRM800K

Cite

Notes

Only stored in your browser.

Papers

Let's Verify Step by Step

ICLR · 2023

OpenAI paper introducing PRM800K and showing that process-supervised reward models (PRMs) trained on per-step correctness substantially outperform outcome-only reward models on MATH.

introduces

Let's Verify Step by Step

ICLR · 2023

OpenAI paper introducing PRM800K and showing that process-supervised reward models (PRMs) trained on per-step correctness substantially outperform outcome-only reward models on MATH.

FAQ

What is PRM800K?: 800,000 step-level human labels on GPT-4 solutions to MATH problems - the canonical process-reward training/eval dataset.
What capabilities does PRM800K test?: PRM800K evaluates math, llm judging.
What license is PRM800K under?: PRM800K is available under MIT.