PowerGrid

Description

PowerGrid is a power grid environment where agents dispatch generators, manage battery storage, handle renewable variability, and maintain grid frequency across crisis scenarios inspired by the 2021 Texas winter storm, the 2003 Northeast blackout, and the 2016 South Australia blackout.

Note: this is a synthetic environment which is majority AI-generated; we recommend testing thoroughly before integrating into an RL pipeline.

Capabilities

Economic dispatch optimization across 8 thermal generators with quadratic cost curves
Frequency regulation via governor droop response and under-frequency load shedding
Grid-scale battery storage management (200 MW / 800 MWh, 85% round-trip efficiency)
Renewable integration (500 MW wind, 300 MW solar) with curtailment decisions
Emergency load shedding and restoration across 3 transmission zones
Transmission congestion management with N-1 contingency constraints
Multi-day crisis management (up to 72 hours in polar vortex scenario)
Dense, multi-component reward signal across 5 dimensions

License

MIT

Tasks

There are 4 training scenarios (5 seeds each = 20 training tasks):

summer_peak: Normal hot summer day dispatch optimization. Evening ramp challenge as solar fades and AC load peaks.
wind_drought: Wind drops from 80% to 5% capacity over 2 hours. Tests proactive thermal ramp-up and reserve management.
cold_snap: Extreme cold (-20C), demand surges to 5,250 MW, gas supply curtailed, generator trips. Inspired by the February 2021 Texas winter storm.
line_outage: Major transmission line trips followed by a generator trip (N-1-1 contingency). Tests transmission-aware redispatch.

And 4 test scenarios (5 seeds each = 20 test tasks):

cascading_failure: Sequential line and generator trips leading to frequency instability. Inspired by the August 2003 Northeast blackout.
renewable_surplus: Low demand weekend with excessive wind and solar. Tests minimum generation management and frequency stability with low inertia.
polar_vortex: 72-hour multi-day extreme cold event with progressive generator deratings and trips. Tests long-horizon strategic planning.
price_spike_crisis: Extreme heat wave drives demand beyond capacity. Political pressure limits acceptable load shedding duration.

Each 24-hour scenario has 96 timesteps (15 minutes each). The polar_vortex scenario has 288 timesteps (72 hours).

Reward Structure

This is a dense, verifiable reward environment. Rewards are calculated per timestep as a weighted sum of five components:

Reliability (40%): Penalty for unserved energy (load shedding)
Cost Efficiency (25%): Lower generation cost relative to baseline
Frequency Stability (15%): Penalty for frequency deviation from 60 Hz
Reserve Adequacy (10%): Penalty if spinning reserves fall below NERC requirement
Renewable Utilization (10%): Bonus for using available renewables without curtailment

Terminal reward of -1.0 for total blackout (frequency collapse below 57.5 Hz). We do not use LLM graders.

Tools

Agents have 11 tools:

Tool	Time Advance	Description
`observe_grid`	No	Read full grid state: frequency, demand, generation, reserves, weather, costs
`dispatch_generators`	Yes	Set MW output targets for one or more generators
`control_battery`	Yes	Charge, discharge, or idle the 200 MW battery
`manage_reserves`	Yes	Set spinning reserve target (advisory)
`shed_load`	Yes	Emergency load shedding by zone (last resort)
`restore_load`	Yes	Restore previously shed load
`start_generator`	Yes	Begin startup of an offline unit
`stop_generator`	Yes	Begin shutdown of an online unit
`curtail_renewable`	Yes	Limit wind or solar output
`advance_time`	Yes	Move to next 15-minute timestep
`submit_log`	No	Document reasoning (no simulation effect)

Time Horizon

Each scenario runs for 96 timesteps (24 hours) except for the polar_vortex scenario which runs for 288 timesteps (72 hours). Each timestep represents 15 minutes of simulated time.

Other Environment Requirements

There are no further environment requirements; PowerGrid works out of the box with the OpenReward endpoint without any external secrets.

Safety

Agents in PowerGrid are tasked with operating a power grid simulation where their decisions affect the reliability of electricity supply to ~2 million simulated customers. The environment does not present direct real-world safety risks as all interactions occur within a self-contained simulation. The environment teaches agents to balance economic efficiency against reliability, with heavy penalties for blackouts and load shedding, which aligns with responsible grid operation practices.

Citations

@dataset{GRPowerGrid,
  author    = {General Reasoning Inc. Team},
  title     = {PowerGrid},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/PowerGrid}
}