s1K

Stanford's hand-curated 1,000-problem reasoning dataset that, paired with budget forcing at inference, produced o1-competitive results for ~$50 of compute.

Open

Type: SFT Dataset
Publisher: Stanford Center for Research on Foundation Models (CRFM)
Capabilities: Math Scientific Reasoning
Runtime: hf_parquet
License: Apache-2.0
Size: 1,000 problems (s1K) / 1.1k (s1K-1.1)
Published: Feb 2025
Canonical: huggingface.co/datasets/simplescaling/s1K

Cite

Notes

Only stored in your browser.

Lift evidence

Eval	Tools known to lift	Source paper
AIME 2024: Problems from the American Invitational Mathematics Examination	s1K	-
MATH-500	s1K	-
GPQA Diamond	s1K	-

Models

Notable models trained on it

s1-32B (Qwen2.5-32B-Instruct fine-tune)s1.1-32Bthe canonical "cheap o1 reproduction" demonstration

Papers

introducess1: Simple Test-Time Scaling

Contributors

Niklas Muennighoff Tatsunori Hashimoto Zitong Yang