Discovery30s

Fresh

Discovery30s is a benchmark that tests the potential of vintage language models to reproduce scientific discoveries after the training data cutoff period. We construct the benchmark by taking a known discovery, e.g. Hückel's rule, and then breaking it down into a "question lad…

Type: RL Env
Publisher: General Reasoning
Tags: Automated Scientific Discovery
Runtime: ORS
License: unknown
Size: 196 tasks
Published: Feb 2026
Canonical: openreward.ai/GeneralReasoning/Discovery30s

Cite

Notes

Only stored in your browser.

Attribution

README: openreward.ai/GeneralReasoning/Discovery30s
Scores: OpenReward

Attribution policy →

Public scores on this env

1 vf-eval report across 1 model

1GPT-5.2 (high) [modern day hindsight]OpenAI97.96

Open the scoring view →