0

PowerGrid

Fresh

PowerGrid is a power grid operator environment where agents dispatch generators, manage battery storage, handle renewable variability, and maintain grid frequency across crisis scenarios inspired by the 2021 Texas winter storm, the 2003 Northeast blackout, and the 2016 South A…

Type
RL Env
Runtime
ORS
License
unknown
Size
40 tasks
Published
Mar 2026

Cite

Notes

Only stored in your browser.

PowerGrid

⭐ OpenReward Environment

Description

PowerGrid is a power grid environment where agents dispatch generators, manage battery storage, handle renewable variability, and maintain grid frequency across crisis scenarios inspired by the 2021 Texas winter storm, the 2003 Northeast blackout, and the 2016 South Australia blackout.

Note: this is a synthetic environment which is majority AI-generated; we recommend testing thoroughly before integrating into an RL pipeline.

Capabilities

  • Economic dispatch optimization across 8 thermal generators with quadratic cost curves
  • Frequency regulation via governor droop response and under-frequency load shedding
  • Grid-scale battery storage management (200 MW / 800 MWh, 85% round-trip efficiency)
  • Renewable integration (500 MW wind, 300 MW solar) with curtailment decisions
  • Emergency load shedding and restoration across 3 transmission zones
  • Transmission congestion management with N-1 contingency constraints
  • Multi-day crisis management (up to 72 hours in polar vortex scenario)
  • Dense, multi-component reward signal across 5 dimensions

License

MIT

Tasks

There are 4 training scenarios (5 seeds each = 20 training tasks):

  • summer_peak: Normal hot summer day dispatch optimization. Evening ramp challenge as solar fades and AC load peaks.
  • wind_drought: Wind drops from 80% to 5% capacity over 2 hours. Tests proactive thermal ramp-up and reserve management.
  • cold_snap: Extreme cold (-20C), demand surges to 5,250 MW, gas supply curtailed, generator trips. Inspired by the February 2021 Texas winter storm.
  • line_outage: Major transmission line trips followed by a generator trip (N-1-1 contingency). Tests transmission-aware redispatch.

And 4 test scenarios (5 seeds each = 20 test tasks):

  • cascading_failure: Sequential line and generator trips leading to frequency instability. Inspired by the August 2003 Northeast blackout.
  • renewable_surplus: Low demand weekend with excessive wind and solar. Tests minimum generation management and frequency stability with low inertia.
  • polar_vortex: 72-hour multi-day extreme cold event with progressive generator deratings and trips. Tests long-horizon strategic planning.
  • price_spike_crisis: Extreme heat wave drives demand beyond capacity. Political pressure limits acceptable load shedding duration.

Each 24-hour scenario has 96 timesteps (15 minutes each). The polar_vortex scenario has 288 timesteps (72 hours).

Reward Structure

This is a dense, verifiable reward environment. Rewards are calculated per timestep as a weighted sum of five components:

  • Reliability (40%): Penalty for unserved energy (load shedding)
  • Cost Efficiency (25%): Lower generation cost relative to baseline
  • Frequency Stability (15%): Penalty for frequency deviation from 60 Hz
  • Reserve Adequacy (10%): Penalty if spinning reserves fall below NERC requirement
  • Renewable Utilization (10%): Bonus for using available renewables without curtailment

Terminal reward of -1.0 for total blackout (frequency collapse below 57.5 Hz). We do not use LLM graders.

Tools

Agents have 11 tools:

ToolTime AdvanceDescription
observe_gridNoRead full grid state: frequency, demand, generation, reserves, weather, costs
dispatch_generatorsYesSet MW output targets for one or more generators
control_batteryYesCharge, discharge, or idle the 200 MW battery
manage_reservesYesSet spinning reserve target (advisory)
shed_loadYesEmergency load shedding by zone (last resort)
restore_loadYesRestore previously shed load
start_generatorYesBegin startup of an offline unit
stop_generatorYesBegin shutdown of an online unit
curtail_renewableYesLimit wind or solar output
advance_timeYesMove to next 15-minute timestep
submit_logNoDocument reasoning (no simulation effect)

Time Horizon

Each scenario runs for 96 timesteps (24 hours) except for the polar_vortex scenario which runs for 288 timesteps (72 hours). Each timestep represents 15 minutes of simulated time.

Other Environment Requirements

There are no further environment requirements; PowerGrid works out of the box with the OpenReward endpoint without any external secrets.

Safety

Agents in PowerGrid are tasked with operating a power grid simulation where their decisions affect the reliability of electricity supply to ~2 million simulated customers. The environment does not present direct real-world safety risks as all interactions occur within a self-contained simulation. The environment teaches agents to balance economic efficiency against reliability, with heavy penalties for blackouts and load shedding, which aligns with responsible grid operation practices.

Citations

@dataset{GRPowerGrid,
  author    = {General Reasoning Inc. Team},
  title     = {PowerGrid},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/PowerGrid}
}