FrontierCO
Description
FrontierCO is an environment for evaluating AI agents on 8 classical combinatorial optimization (CO) problems. Agents write Python code in a sandbox to solve problem instances, then submit solutions for server-side evaluation against known optimal or best-known scores.
Capabilities
- 8 diverse CO problem types spanning routing, scheduling, facility location, and graph problems
- Python code generation and execution in a sandbox environment
- Access to
pip installfor additional packages - Server-side solution evaluation with normalized scoring
Compute Requirements
Agents are given a sandboxed Docker environment with a pre-built instance image for each task. Default sandbox size is 1 CPU and 2 GB RAM. Network access enabled. No GPU required.
License
Tasks
Splits:
train: Validation instances (~120 tasks across 7 problem types)test: Easy + hard test instances (~450+ tasks across all 8 problem types)
Problem Types:
| Type | Full Name | Instances |
|---|---|---|
| TSP | Traveling Salesman Problem | 10 valid, 29 easy, 19 hard |
| MIS | Maximum Independent Set | 20 valid, 37 easy, 16 hard |
| MDS | Minimum Dominating Set | 20 valid, 20 easy, 20 hard |
| CVRP | Capacitated Vehicle Routing Problem | 15 valid, 20 easy, 10 hard |
| CFLP | Capacitated Facility Location Problem | 20 valid, 20 easy, 30 hard |
| CPMP | Capacitated p-Median Problem | 20 valid, 31 easy, 12 hard |
| FJSP | Flexible Job-shop Scheduling Problem | 0 valid, 57 easy, 24 hard |
| STP | Steiner Tree Problem | 15 valid, 23 easy, 50 hard |
Reward Structure
- Normalized reward in [0.0, 1.0] using:
1 - abs(score - optimal) / max(score, optimal) - 1.0 = optimal solution, degrades toward 0 for worse solutions
- Invalid solutions receive 0.0
- Single evaluation per submission (terminal action)
Data
- Source: CO-Bench/FrontierCO on HuggingFace
- Instance files in problem-specific formats (.tsp, .mis, .gr, .vrp, .plc, .txt, .fjs, .stp)
- ~5 GB total with LFS
Tools
| Tool | Description |
|---|---|
bash | Execute shell commands in sandbox |
read_file | Read file contents from sandbox |
write_file | Write file contents to sandbox |
list_files | List directory contents in sandbox |
submit | Submit solution JSON for server-side evaluation (terminal) |
Time Horizon
Multi-turn. Agents typically make 10-30+ tool calls: reading instance data, writing solver code, testing, and submitting.
Environment Difficulty
- Easy test instances: Historically challenging but solvable by SOTA human-designed solvers
- Hard test instances: Computationally intensive instances lacking known optimal solutions (includes extreme scales like TSP with 10M nodes, MIS with 8M nodes)
- Validation instances: Development/tuning instances
Safety
Sandbox execution is isolated. Network access is enabled for pip installs. No sensitive data in instances.
Citations
@article{feng2025frontierco,
title={FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization},
author={Shengyu Feng and Weiwei Sun and Shanda Li and Ameet Talwalkar and Yiming Yang},
year={2025},
eprint={2505.16952},
archivePrefix={arXiv},
primaryClass={cs.LG},
}