Multimodal aim training environment for vision-language model evaluation and RL training.
| |
|---|
| Environment ID | aim-labs-env |
| Tags | multi-turn, multimodal, vision, tool-use, train, eval |
| Screen Resolution | 1024x768 pixels |
| Default Turns | 20 per game |
Each rollout is a game where the agent must click on red target circles:
- Agent sees a 1024x768 image with a red target at a random position
- Agent calls
click(x=..., y=...) to click on the target
- Hit = click within target radius, Miss = click outside
- Reward = hits / attempts (accuracy from 0.0 to 1.0)
click(x: int, y: int)
| Parameter | Type | Description |
|---|
x | int | Horizontal position (0 = left edge, 1024 = right edge) |
y | int | Vertical position (0 = top edge, 768 = bottom edge) |
# Basic evaluation
prime eval run aim-labs-env -m qwen/qwen3-vl-235b-a22b-instruct
# With options
prime eval run aim-labs-env \
-m openai/gpt-4o \
-n 10 -r 3 \
-a '{"difficulty": "easy", "max_turns": 10}'
# Demo mode (saves click visualizations)
prime eval run aim-labs-env \
-m qwen/qwen3-vl-235b-a22b-instruct \
-n 1 -r 1 \
-a '{"demo": true, "max_turns": 5}'
| Argument | Type | Default | Description |
|---|
difficulty | str | "medium" | Target size preset |
max_turns | int | 20 | Targets per game |
num_examples | int | 100 | Number of game sessions |
seed | int | None | Random seed |
demo | bool | False | Save click visualization images |
demo_output | str | "./demo_output" | Directory for demo images |
| Difficulty | Target Radius | Use Case |
|---|
easy | 100px | Basic multimodal capability testing |
medium | 60px | Standard difficulty |
hard | 35px | Precise coordinate estimation |
| Metric | Description |
|---|
reward | Accuracy (hits / attempts) |
total_hits | Successful clicks |
total_attempts | Total clicks made |
average_distance | Mean distance from click to target center (px) |
User: Turn 1/20 - Click the target: [image]