0

Ether0

Fresh

The dataset used to test the ether0 scientific reasoning model.

Type
RL Env
Runtime
ORS
License
unknown
Size
325 tasks
Published
Jan 2026

Cite

Notes

Only stored in your browser.

Ether

OpenReward Environment Hugging Face Dataset

Description

Ether is an environment for evaluating chemistry reasoning where all answers are molecules in SMILES format. Based on the ether0-benchmark from FutureHouse, tasks include reaction prediction (USPTO/ORD), molecular captioning (PubChem), GHS classification prediction, IUPAC name to SMILES conversion, and molecular design. Each task category contains approximately 25 questions for balanced evaluation.

Capabilities

  • Chemistry question answering with SMILES output
  • Reaction prediction and retrosynthesis
  • Molecular property reasoning
  • IUPAC name to structure conversion
  • Molecular design and modification

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC BY 4.0

Tasks

There is one split in this environment:

  • test: 325 chemistry questions

All answers are molecules represented as SMILES strings. Task types include:

  • Reaction prediction
  • Molecular captioning
  • GHS classification
  • IUPAC to SMILES conversion
  • Molecular design

Reward Structure

This is a sparse, verifiable reward environment with domain-specific grading:

  1. Agent submits an answer containing a SMILES string
  2. An LLM (gpt-4o) extracts the SMILES from the response
  3. Domain-specific reward functions from the ether0 package evaluate correctness:
    • Exact match: String equality for IUPAC/formula tasks
    • Molecular validity: RDKit parsing and sanitization
    • Structure similarity: Tanimoto similarity for design tasks
    • Reaction correctness: Product matching for synthesis tasks
  4. Reward ranges from 0.0 to 1.0

Data

Data is sourced from the futurehouse/ether0-benchmark HuggingFace dataset. Tasks are derived from commonly used chemistry benchmarks including USPTO, ORD, and PubChem.

Tools

ToolDescription
answerSubmit answer containing a SMILES molecule for evaluation

Time Horizon

Single-turn. Agents receive a chemistry question and submit one answer.

Environment Difficulty

ModelAccuracy
ResearchAgent (GPT 4.1)48.97%
ResearchAgent (DeepSeek V3)32.00%

ether0 (the specialized chemistry model) outperforms general-purpose LLMs on these tasks.

Other Environment Requirements

OpenAI API key required for SMILES extraction. Pass via secrets={"openai_api_key": "..."}.

Safety

Agents in Ether answer chemistry questions and generate molecular structures. There is a dual-use concern as molecular generation capabilities could be applied to both beneficial and harmful purposes.

Citation

@article{narayanan2025ether0,
  title={Training a Scientific Reasoning Model for Chemistry},
  author={Narayanan, Sunil Muralidhar and Braza, James D. and Griffiths, Ryan-Rhys and Bou, Adri{\`a} and Wellawatte, Geemi and Ramos, Michael C. and Mitchener, Lewis and Rodriques, Samuel G. and White, Andrew D.},
  journal={arXiv preprint arXiv:2506.17238},
  year={2025},
  url={https://arxiv.org/abs/2506.17238}
}