DeepSeek R1-Zero
Open weights
DeepSeek's pure-RL reasoning model - trained directly from DeepSeek-V3-Base via large-scale GRPO with no SFT cold-start; the proof-of-concept paired with R1.
- Publisher
- DeepSeek
- Family
- Deepseek R
- Params
- 671B total / 37B active
- Context
- 128K
- Input
- Output
- Providers
- self-host
- License
- MIT
- Released
- Jan 2025
Cite
Notes
Only stored in your browser.
Context
128K
Reported on 1 eval
Reported eval scores
1AIME202471