s1K
Stanford's hand-curated 1,000-problem reasoning dataset that, paired with budget forcing at inference, produced o1-competitive results for ~$50 of compute.
- Type
- SFT Dataset
- Capabilities
- MathScientific Reasoning
- Runtime
hf_parquet- License
- Apache-2.0
- Size
- 1,000 problems (s1K) / 1.1k (s1K-1.1)
- Published
- Feb 2025
Cite
Notes
Only stored in your browser.
Lift evidence
3| Eval | Tools known to lift | Source paper |
|---|---|---|
| AIME 2024: Problems from the American Invitational Mathematics Examination | s1K | - |
| MATH-500 | s1K | - |
| GPQA Diamond | s1K | - |
Models
Notable models trained on it
s1-32B (Qwen2.5-32B-Instruct fine-tune)s1.1-32Bthe canonical "cheap o1 reproduction" demonstration