Let's Verify Step by Step
OpenAI paper that releases PRM800K, an 800K-step process reward dataset, and shows process supervision substantially beats outcome supervision on MATH.
- Publisher
- OpenAI
- Year
- 2023
- Venue
- ICLR
- Authors
- 10
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
TL;DR
Semantic Scholar
This work conducts its own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset and shows that active learning significantly improves the efficacy of process supervision.