Ether
Description
Ether is an environment for evaluating chemistry reasoning where all answers are molecules in SMILES format. Based on the ether0-benchmark from FutureHouse, tasks include reaction prediction (USPTO/ORD), molecular captioning (PubChem), GHS classification prediction, IUPAC name to SMILES conversion, and molecular design. Each task category contains approximately 25 questions for balanced evaluation.
Capabilities
- Chemistry question answering with SMILES output
- Reaction prediction and retrosynthesis
- Molecular property reasoning
- IUPAC name to structure conversion
- Molecular design and modification
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
License
Tasks
There is one split in this environment:
- test: 325 chemistry questions
All answers are molecules represented as SMILES strings. Task types include:
- Reaction prediction
- Molecular captioning
- GHS classification
- IUPAC to SMILES conversion
- Molecular design
Reward Structure
This is a sparse, verifiable reward environment with domain-specific grading:
- Agent submits an answer containing a SMILES string
- An LLM (gpt-4o) extracts the SMILES from the response
- Domain-specific reward functions from the ether0 package evaluate correctness:
- Exact match: String equality for IUPAC/formula tasks
- Molecular validity: RDKit parsing and sanitization
- Structure similarity: Tanimoto similarity for design tasks
- Reaction correctness: Product matching for synthesis tasks
- Reward ranges from 0.0 to 1.0
Data
Data is sourced from the futurehouse/ether0-benchmark HuggingFace dataset. Tasks are derived from commonly used chemistry benchmarks including USPTO, ORD, and PubChem.
Tools
| Tool | Description |
|---|---|
answer | Submit answer containing a SMILES molecule for evaluation |
Time Horizon
Single-turn. Agents receive a chemistry question and submit one answer.
Environment Difficulty
| Model | Accuracy |
|---|---|
| ResearchAgent (GPT 4.1) | 48.97% |
| ResearchAgent (DeepSeek V3) | 32.00% |
ether0 (the specialized chemistry model) outperforms general-purpose LLMs on these tasks.
Other Environment Requirements
OpenAI API key required for SMILES extraction. Pass via secrets={"openai_api_key": "..."}.
Safety
Agents in Ether answer chemistry questions and generate molecular structures. There is a dual-use concern as molecular generation capabilities could be applied to both beneficial and harmful purposes.
Citation
@article{narayanan2025ether0,
title={Training a Scientific Reasoning Model for Chemistry},
author={Narayanan, Sunil Muralidhar and Braza, James D. and Griffiths, Ryan-Rhys and Bou, Adri{\`a} and Wellawatte, Geemi and Ramos, Michael C. and Mitchener, Lewis and Rodriques, Samuel G. and White, Andrew D.},
journal={arXiv preprint arXiv:2506.17238},
year={2025},
url={https://arxiv.org/abs/2506.17238}
}