0

Let's Verify Step by Step

OpenAI paper that releases PRM800K, an 800K-step process reward dataset, and shows process supervision substantially beats outcome supervision on MATH.

Publisher
OpenAI
Year
2023
Venue
ICLR
Authors
10
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

TL;DR

Semantic Scholar

This work conducts its own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset and shows that active learning significantly improves the efficacy of process supervision.

Authors

10