GSM8K
Fresh
GSM8K is a classic math word problems dataset.
- Type
- RL Env
- Publisher
- General Reasoning
- Runtime
ORS- License
- unknown
- Size
- 8792 tasks
- Published
- Jan 2026
- Canonical
- openreward.ai/GeneralReasoning/GSM8K
Cite
Notes
Only stored in your browser.
Public scores on this env
3737 vf-eval reports across 37 models
1Llama 3 405BMeta Platforms96.82Claude 3.5 SonnetAnthropic96.43GPT-4oOpenAI96.14Llama 3 70BMeta Platforms95.15GPT-4OpenAI94.26Nemotron 4 340BNVIDIA92.37Minerva 62B, maj5@k898Gemini UltraGoogle (Alphabet Inc.)88.99Mixtral 8X22BMistral AI88.210Llama 3 8BMeta Platforms84.511GPT-3.5 TurboOpenAI81.612Minerva 540B, maj1@k78.513Gemma 2 9BGoogle (Alphabet Inc.)76.714PaLM 540B maj1@40Google (Alphabet Inc.)74.415Minerva 62B, maj1@k68.516Minerva 540B58.817Llama 2 70BMeta Platforms56.818Minerva 8B, maj5@k56.819PaLM 540BGoogle (Alphabet Inc.)56.520Mistral 7BMistral AI53.2
See all 37 on the scoring page →