Discovery30s
Fresh
Discovery30s is a benchmark that tests the potential of vintage language models to reproduce scientific discoveries after the training data cutoff period. We construct the benchmark by taking a known discovery, e.g. Hückel's rule, and then breaking it down into a "question lad…
- Type
- RL Env
- Publisher
- General Reasoning
- Runtime
ORS- License
- unknown
- Size
- 196 tasks
- Published
- Feb 2026
Cite
Notes
Only stored in your browser.
Public scores on this env
11 vf-eval report across 1 model