0

ControlEval

Fresh

ControlEval is an evaluation dataset that comprises 500 control tasks with various specific design goals.

Type
RL Env
Runtime
ORS
License
unknown
Size
500 tasks
Published
Mar 2026

Cite

Notes

Only stored in your browser.

ControlEval

OpenReward Environment

Description

ControlEval is an environment for evaluating LLM agents on classical control system design. Given a plant transfer function G(s) and performance specifications, agents must design a controller C(s) such that the closed-loop system simultaneously satisfies stability, robustness, and time-domain performance constraints.

Capabilities

  • Transfer function analysis and manipulation
  • Controller design (PI, PID, lead/lag compensators, etc.)
  • Frequency-domain analysis (phase/gain margins)
  • Time-domain performance evaluation (settling time, overshoot, steady-state error)
  • Iterative design refinement based on performance feedback

Compute Requirements

No special compute requirements. Evaluation uses the python-control library for deterministic transfer function computations.

Tasks

500 tasks in a single test split, across 10 categories (50 tasks each):

CategorySystem Type
first_order_stable_fast1st-order stable, fast response
first_order_stable_moderate1st-order stable, moderate response
first_order_stable_slow1st-order stable, slow response
first_order_unstable1st-order unstable
first_order_w_delay1st-order with time delay
second_order_stable_fast2nd-order stable, fast response
second_order_stable_moderate2nd-order stable, moderate response
second_order_stable_slow2nd-order stable, slow response
second_order_unstable2nd-order unstable
higher_order3rd–5th order systems

Each task specifies a plant G(s) via numerator/denominator polynomial coefficients, an optional time delay, and numerical performance constraints.

Reward Structure

Binary (sparse). Reward is 1.0 if ALL of the following constraints are satisfied, 0.0 otherwise:

  1. Stability: all closed-loop poles have real part < -0.01
  2. Phase margin ≥ specified minimum (degrees)
  3. Settling time within specified [min, max] range (2% criterion)
  4. Steady-state error ≤ specified maximum

Tools

ToolDescription
evaluateTest a candidate controller C(s). Returns all performance metrics (stability, margins, settling time, steady-state error) without ending the episode.
submitSubmit a final controller C(s) for grading. Returns metrics and reward. Ends the episode.

Both tools accept controller transfer function coefficients: num (numerator) and den (denominator) as lists of floats in descending powers of s.

Time Horizon

Multi-turn. Agents can call evaluate iteratively to refine their controller design before calling submit. Typical solutions require 1–10 evaluate calls.

Environment Difficulty

Difficulty varies by category. First-order stable systems are easiest; higher-order and unstable systems are hardest. The original ControlAgent paper reports 53–95% success rates across categories using GPT-4 with domain-specific prompting.

Safety

This environment involves mathematical computation only. No safety concerns.

Citations

@article{guo2024controlagent,
  title={ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise},
  author={Guo, Xingang and Keivan, Darioush and Syed, Usman and Qin, Lianhui and Zhang, Huan and Dullerud, Geir and Seiler, Peter and Hu, Bin},
  journal={arXiv preprint arXiv:2410.19811},
  year={2024}
}