Ether

Description

Ether is an environment for evaluating chemistry reasoning where all answers are molecules in SMILES format. Based on the ether0-benchmark from FutureHouse, tasks include reaction prediction (USPTO/ORD), molecular captioning (PubChem), GHS classification prediction, IUPAC name to SMILES conversion, and molecular design. Each task category contains approximately 25 questions for balanced evaluation.

Capabilities

Chemistry question answering with SMILES output
Reaction prediction and retrosynthesis
Molecular property reasoning
IUPAC name to structure conversion
Molecular design and modification

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC BY 4.0

Tasks

There is one split in this environment:

test: 325 chemistry questions

All answers are molecules represented as SMILES strings. Task types include:

Reaction prediction
Molecular captioning
GHS classification
IUPAC to SMILES conversion
Molecular design

Reward Structure

This is a sparse, verifiable reward environment with domain-specific grading:

Agent submits an answer containing a SMILES string
An LLM (gpt-4o) extracts the SMILES from the response
Domain-specific reward functions from the ether0 package evaluate correctness:
- Exact match: String equality for IUPAC/formula tasks
- Molecular validity: RDKit parsing and sanitization
- Structure similarity: Tanimoto similarity for design tasks
- Reaction correctness: Product matching for synthesis tasks
Reward ranges from 0.0 to 1.0

Data

Data is sourced from the futurehouse/ether0-benchmark HuggingFace dataset. Tasks are derived from commonly used chemistry benchmarks including USPTO, ORD, and PubChem.

Tools

Tool	Description
`answer`	Submit answer containing a SMILES molecule for evaluation

Time Horizon

Single-turn. Agents receive a chemistry question and submit one answer.

Environment Difficulty

Model	Accuracy
ResearchAgent (GPT 4.1)	48.97%
ResearchAgent (DeepSeek V3)	32.00%

ether0 (the specialized chemistry model) outperforms general-purpose LLMs on these tasks.

Other Environment Requirements

OpenAI API key required for SMILES extraction. Pass via secrets={"openai_api_key": "..."}.

Safety

Agents in Ether answer chemistry questions and generate molecular structures. There is a dual-use concern as molecular generation capabilities could be applied to both beneficial and harmful purposes.

Citation

@article{narayanan2025ether0,
  title={Training a Scientific Reasoning Model for Chemistry},
  author={Narayanan, Sunil Muralidhar and Braza, James D. and Griffiths, Ryan-Rhys and Bou, Adri{\`a} and Wellawatte, Geemi and Ramos, Michael C. and Mitchener, Lewis and Rodriques, Samuel G. and White, Andrew D.},
  journal={arXiv preprint arXiv:2506.17238},
  year={2025},
  url={https://arxiv.org/abs/2506.17238}
}