MobileEnv

Description

MobileEnv is an environment for evaluating language model agents on wireless mobile network coordination tasks. Agents act as centralized network controllers, deciding which base station each user equipment device should connect to at each timestep to maximize overall Quality of Experience (utility). The environment wraps the mobile-env Gymnasium simulation.

Capabilities

Analyzing wireless network state (connections, signal strength, utility)
Sequential resource allocation decisions over 100 timesteps
Load balancing across multiple base stations
Adapting to dynamic user movement and changing signal conditions

Compute Requirements

MobileEnv runs a lightweight procedural simulation. It does not require a sandbox and has minimal compute requirements.

License

MIT.

Tasks

There are 1000 tasks in a single train split across 3 scenario sizes with varying random seeds:

Scenario	Base Stations	User Equipment	Tasks
Small	3	5	333
Medium	7	15	333
Large	13	30	334

Each task runs for 100 timesteps. At each timestep, the agent observes the network state and decides which base station each UE should connect to. Different random seeds produce different user movement trajectories and initial placements.

Reward Structure

This is a dense, verifiable reward environment. After each timestep, the reward is the average utility across all user equipment devices:

$$\text{reward}t = \frac{1}{N{UE}} \sum_{i=1}^{N_{UE}} u_i$$

where $u_i \in [-1, 1]$ is the bounded log utility of UE $i$. The final reward at episode completion is the cumulative sum of all step rewards.

We do not use LLM graders for this task.

Data

MobileEnv uses procedurally generated simulation data from the mobile-env package. No external datasets are required. Different random seeds produce different user placements and movement patterns.

Tools

Agents are given two tools:

observe: View the current network state without advancing time. Returns each UE's current connections, reachability (which BSs have sufficient signal for a connection), relative signal-to-noise ratios, and utility.
step: Submit connection decisions for all UEs and advance the simulation by one timestep. Returns the step reward and new network state.

Note: SNR values shown are relative per-UE (best BS = 1.0), not absolute signal strength. The "Reachable" field indicates which BSs a UE can actually connect to based on the raw signal threshold. Connection attempts to unreachable BSs silently fail. Reachability changes over time as UEs move.

Time Horizon

MobileEnv is a multi-turn environment with 100 sequential decisions per task (one per timestep), each requiring one step tool call. The agent may optionally call observe at any point.

Other Environment Requirements

There are no further environment requirements; MobileEnv works out of the box without any secrets.

Safety

Agents in MobileEnv optimize wireless network connections in a simulated environment. The environment does not present safety risks, as agents only interact with an artificial simulation with no real-world network effects.

Citations

@inproceedings{schneider2022mobileenv,
  author    = {Schneider, Stefan and Werner, Stefan and Khalili, Ramin and Hecker, Artur and Karl, Holger},
  title     = {mobile-env: An Open Platform for Reinforcement Learning in Wireless Mobile Networks},
  booktitle = {IEEE/IFIP Network Operations and Management Symposium (NOMS)},
  year      = {2022},
  publisher = {IEEE}
}