MarsExplorer
Description
MarsExplorer is an environment for evaluating agents on grid-based terrain exploration and coverage. Based on the MarsExplorer environment by Koutras et al., agents control a rover navigating a procedurally-generated 2D grid with obstacles, using a simulated LIDAR sensor to reveal unknown terrain. The goal is to explore as much of the map as possible while avoiding obstacles and staying within bounds.
Capabilities
- Spatial reasoning and navigation planning on ASCII grid maps
- Obstacle avoidance from partial observations
- Exploration strategy under step budgets
- Adapting to procedurally-generated terrain of varying difficulty
Compute Requirements
MarsExplorer does not require a sandbox. All game logic runs in-process with minimal compute.
License
Tasks
There are 1,000 training tasks across three difficulty tiers:
- Small (334 tasks): 11x11 grid, 5 obstacles, 150 max steps, LIDAR range 4
- Medium (333 tasks): 21x21 grid, 12 obstacles, 400 max steps, LIDAR range 6
- Large (333 tasks): 41x41 grid, 30 obstacles, 1000 max steps, LIDAR range 8
Each task uses a fixed seed for reproducible map generation. The agent starts at position (0, 0) and must explore the grid by issuing directional move commands.
Reward Structure
This is a dense, verifiable reward environment matching the original MarsExplorer reward structure. No LLM graders are used.
Per-step reward (every move):
new_explored_cells - movement_cost(movement_cost = 0.2)- Exploring new terrain yields positive reward; revisiting explored areas costs 0.2
Terminal rewards:
- 95%+ explored (success): +400 bonus
- Collision with obstacle: -400 penalty, episode ends
- Out of bounds: -400 penalty, episode ends
- Max steps reached: episode ends with final step reward
Tools
move(direction): Move the rover one cell in a cardinal direction ("up","down","left","right"). Returns an ASCII map observation showing the current state of exploration, along with step count, position, and exploration percentage.
Map Representation
The agent receives a text-based ASCII grid after each move:
Step: 15/400 | Position: (5, 3) | Explored: 34.2% (151/441)
01234567890123456789
0 ..............??????
1 ..............??????
2 ......##......??????
3 .....@.#......??????
4 ..............??????
...
Legend: @ = rover, # = obstacle, . = explored, ? = unexplored
Other Environment Requirements
There are no further environment requirements. MarsExplorer works out of the box with the OpenReward endpoint without any secrets.
Safety
MarsExplorer is a grid navigation task with no safety concerns. The agent interacts only with an abstract grid world and cannot affect any real systems.
Citations
@article{koutras2021marsexplorer,
title={MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments},
author={Koutras, Dimitrios I. and Kapoutsis, Athanasios Ch. and Amanatiadis, Angelos A. and Kosmatopoulos, Elias B.},
journal={Electronics},
volume={10},
number={22},
pages={2751},
year={2021},
publisher={MDPI},
doi={10.3390/electronics10222751}
}