ControlEval

Description

ControlEval is an environment for evaluating LLM agents on classical control system design. Given a plant transfer function G(s) and performance specifications, agents must design a controller C(s) such that the closed-loop system simultaneously satisfies stability, robustness, and time-domain performance constraints.

Capabilities

Transfer function analysis and manipulation
Controller design (PI, PID, lead/lag compensators, etc.)
Frequency-domain analysis (phase/gain margins)
Time-domain performance evaluation (settling time, overshoot, steady-state error)
Iterative design refinement based on performance feedback

Compute Requirements

No special compute requirements. Evaluation uses the python-control library for deterministic transfer function computations.

Tasks

500 tasks in a single test split, across 10 categories (50 tasks each):

Category	System Type
`first_order_stable_fast`	1st-order stable, fast response
`first_order_stable_moderate`	1st-order stable, moderate response
`first_order_stable_slow`	1st-order stable, slow response
`first_order_unstable`	1st-order unstable
`first_order_w_delay`	1st-order with time delay
`second_order_stable_fast`	2nd-order stable, fast response
`second_order_stable_moderate`	2nd-order stable, moderate response
`second_order_stable_slow`	2nd-order stable, slow response
`second_order_unstable`	2nd-order unstable
`higher_order`	3rd–5th order systems

Each task specifies a plant G(s) via numerator/denominator polynomial coefficients, an optional time delay, and numerical performance constraints.

Reward Structure

Binary (sparse). Reward is 1.0 if ALL of the following constraints are satisfied, 0.0 otherwise:

Stability: all closed-loop poles have real part < -0.01
Phase margin ≥ specified minimum (degrees)
Settling time within specified [min, max] range (2% criterion)
Steady-state error ≤ specified maximum

Tools

Tool	Description
`evaluate`	Test a candidate controller C(s). Returns all performance metrics (stability, margins, settling time, steady-state error) without ending the episode.
`submit`	Submit a final controller C(s) for grading. Returns metrics and reward. Ends the episode.

Both tools accept controller transfer function coefficients: num (numerator) and den (denominator) as lists of floats in descending powers of s.

Time Horizon

Multi-turn. Agents can call evaluate iteratively to refine their controller design before calling submit. Typical solutions require 1–10 evaluate calls.

Environment Difficulty

Difficulty varies by category. First-order stable systems are easiest; higher-order and unstable systems are hardest. The original ControlAgent paper reports 53–95% success rates across categories using GPT-4 with domain-specific prompting.

Safety

This environment involves mathematical computation only. No safety concerns.

Citations

@article{guo2024controlagent,
  title={ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise},
  author={Guo, Xingang and Keivan, Darioush and Syed, Usman and Qin, Lianhui and Zhang, Huan and Dullerud, Geir and Seiler, Peter and Hu, Bin},
  journal={arXiv preprint arXiv:2410.19811},
  year={2024}
}