0

Open RL

Fresh

Open-RL by Turing consists of self-contained, verifiable, and unambiguous STEM reasoning problems across Physics, Mathematics, Biology, and Chemistry.

Type
RL Env
Runtime
ORS
License
unknown
Size
40 tasks
Published
Mar 2026

Cite

Notes

Only stored in your browser.

Open-RL

OpenReward Environment Hugging Face Dataset

Description

Open-RL is an environment sourced from the Turing Enterprises dataset for evaluating AI agents on self-contained, verifiable STEM reasoning problems. Problems span Physics, Mathematics, Chemistry, and Biology, requiring multi-step reasoning, symbolic manipulation, and numerical computation.

Capabilities

  • Solving complex STEM problems requiring multi-step reasoning
  • Symbolic manipulation and algebraic simplification
  • Numerical computation and derivations
  • Cross-domain scientific reasoning

License

MIT

Tasks

There are 40 tasks in the train split, covering:

  • Physics: Astrophysics, electromagnetism, quantum mechanics, condensed matter, classical mechanics
  • Mathematics: Number theory, special functions, combinatorics, analysis
  • Chemistry: General, medicinal, inorganic chemistry
  • Biology: Molecular biology, immunology, neurobiology, physiology, microbiology

Reward Structure

Binary reward (0 or 1) based on answer correctness. An LLM grader (gpt-5-mini) checks semantic/symbolic equivalence between the submitted answer and ground truth.

Data

Data is sourced from the TuringEnterprises/Open-RL dataset on Hugging Face.

Tools

Single tool:

  • answer(answer: str) - Submit your solution to be graded

Time Horizon

Open-RL is a single-turn environment. Each task requires exactly one tool call to submit an answer. The agent receives a problem, performs reasoning, and submits its final answer.

Other Environment Requirements

  • OpenAI API Key: Required for LLM-based grading

Safety

Open-RL presents minimal safety risks. Agents interact only with static STEM problems and submit text answers for grading. There is no network access, filesystem interaction, or execution of agent-generated code. The environment does not involve real-world actions, external systems, or other agents.

Citations

@dataset{OpenRL2024,
  author    = {Turing Enterprises},
  title     = {Open-RL: Self-contained, verifiable STEM reasoning problems},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/TuringEnterprises/Open-RL}
}