Training Verifiers to Solve Math Word Problems

Introduces GSM8K (8.5k grade-school math word problems) and shows that training a verifier to re-rank generated solutions outperforms simply fine-tuning on the dataset.

Open

Preview
Publisher: OpenAI
Year: 2021
Venue: preprint
ArXiv: arxiv.org/abs/2110.14168
Code: github.com/openai/grade-school-math
Authors: 12
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2110.14168
TL;DR: semanticscholar.org/paper/d6045d2ccc9c09ca1671348de86d07da6bc28eea
Code: github.com/openai/grade-school-math

Attribution policy →

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

It is demonstrated that verification significantly improves performance on GSM8K, and there is strong empirical evidence that verification scales more effectively with increased data than a finetuning baseline.

Artifacts

Evals

GSM8K

Authors

Christopher Hesse Heewoo Jun Jacob Hilton Jerzy "Jerry" Tworek John Schulman Karl Cobbe Łukasz Kaiser Mark Chen Matthias Plappert Mohammad Bavarian Reiichiro Nakano Vineet Kosaraju