0

FrontierCO

Fresh

FrontierCO is a curated benchmark suite for evaluating ML-based solvers on large-scale and real-world Combinatorial Optimization (CO) problems. The benchmark spans 8 classical CO problems across 5 application domains, providing both training and evaluation instances specifical…

Type
RL Env
Runtime
ORS
License
unknown
Size
538 tasks
Published
Mar 2026

Cite

Notes

Only stored in your browser.

FrontierCO

OpenReward Environment

Description

FrontierCO is an environment for evaluating AI agents on 8 classical combinatorial optimization (CO) problems. Agents write Python code in a sandbox to solve problem instances, then submit solutions for server-side evaluation against known optimal or best-known scores.

Capabilities

  • 8 diverse CO problem types spanning routing, scheduling, facility location, and graph problems
  • Python code generation and execution in a sandbox environment
  • Access to pip install for additional packages
  • Server-side solution evaluation with normalized scoring

Compute Requirements

Agents are given a sandboxed Docker environment with a pre-built instance image for each task. Default sandbox size is 1 CPU and 2 GB RAM. Network access enabled. No GPU required.

License

MIT

Tasks

Splits:

  • train: Validation instances (~120 tasks across 7 problem types)
  • test: Easy + hard test instances (~450+ tasks across all 8 problem types)

Problem Types:

TypeFull NameInstances
TSPTraveling Salesman Problem10 valid, 29 easy, 19 hard
MISMaximum Independent Set20 valid, 37 easy, 16 hard
MDSMinimum Dominating Set20 valid, 20 easy, 20 hard
CVRPCapacitated Vehicle Routing Problem15 valid, 20 easy, 10 hard
CFLPCapacitated Facility Location Problem20 valid, 20 easy, 30 hard
CPMPCapacitated p-Median Problem20 valid, 31 easy, 12 hard
FJSPFlexible Job-shop Scheduling Problem0 valid, 57 easy, 24 hard
STPSteiner Tree Problem15 valid, 23 easy, 50 hard

Reward Structure

  • Normalized reward in [0.0, 1.0] using: 1 - abs(score - optimal) / max(score, optimal)
  • 1.0 = optimal solution, degrades toward 0 for worse solutions
  • Invalid solutions receive 0.0
  • Single evaluation per submission (terminal action)

Data

Tools

ToolDescription
bashExecute shell commands in sandbox
read_fileRead file contents from sandbox
write_fileWrite file contents to sandbox
list_filesList directory contents in sandbox
submitSubmit solution JSON for server-side evaluation (terminal)

Time Horizon

Multi-turn. Agents typically make 10-30+ tool calls: reading instance data, writing solver code, testing, and submitting.

Environment Difficulty

  • Easy test instances: Historically challenging but solvable by SOTA human-designed solvers
  • Hard test instances: Computationally intensive instances lacking known optimal solutions (includes extreme scales like TSP with 10M nodes, MIS with 8M nodes)
  • Validation instances: Development/tuning instances

Safety

Sandbox execution is isolated. Network access is enabled for pip installs. No sensitive data in instances.

Citations

@article{feng2025frontierco,
      title={FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization},
      author={Shengyu Feng and Weiwei Sun and Shanda Li and Ameet Talwalkar and Yiming Yang},
      year={2025},
      eprint={2505.16952},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
}