0

DeepSeek R1-Zero

Open weights

DeepSeek's pure-RL reasoning model - trained directly from DeepSeek-V3-Base via large-scale GRPO with no SFT cold-start; the proof-of-concept paired with R1.

Publisher
DeepSeek
Params
671B total / 37B active
Context
128K
Input
Output
Providers
self-host
License
MIT
Released
Jan 2025

Cite

Notes

Only stored in your browser.

Attribution

Benchmark scores
OpenReward
Attribution policy →
Context
128K

Reported on 1 eval

Reported eval scores

1

Introduced in

paperDeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning