flappybird
Overview
- Environment ID:
flappybird - Short description: Multi-turn Flappy Bird control environment where an LLM decides TAP or NOOP each tick to navigate through pipes.
- Tags:
rl,game,control,verifiers,real-time
A physics-based Flappy Bird simulation where the agent controls a bird navigating through gaps in scrolling pipes. The bird falls due to gravity and must tap to jump, avoiding collisions with pipes and boundaries. Features progressive difficulty ramping from easy initial gaps to tighter challenges.
Datasets
- Primary dataset: Synthetic game states generated from fresh
FlappyGameinstances - Source: Custom implementation with configurable physics
- Split sizes: Configurable via
num_examples(default 10)
Task
- Type: Multi-turn real-time control
- Parser:
vf.XMLParser(fields=["think", "actions"], answer_field="actions") - Rubric overview:
flappy_reward_func(weight 2.0): Number of pipes successfully passedflappy_survival_reward(weight 0.2): Accumulated survival bonus with time-based multiplier- Parser format reward (weight 0.2): Enforces
<THINK>/<ACTIONS>XML structure
Action Format
The agent outputs one action per tick:
<THINK>Brief plan using physics projection</THINK>
<ACTIONS>[TAP]</ACTIONS>
or for no-op:
<THINK>Safe position, let gravity adjust</THINK>
<ACTIONS>[]</ACTIONS>
Available actions:
[TAP]: Set vertical velocity to jump impulse (+1.6)[]: No action, gravity applies (-0.3 per tick)
Game Physics
| Parameter | Default | Description |
|---|---|---|
world_w | 24.0 | World width |
world_h | 20.0 | World height (vertical bounds: ±10) |
bird_x | 4.0 | Fixed horizontal bird position |
bird_radius | 0.20 | Collision radius |
gravity | -0.30 | Downward acceleration per tick |
jump_impulse | 1.6 | Upward velocity on TAP |
pipe_speed | 0.30 | Leftward pipe movement per tick |
easy_gap_height | 6.0 | Initial gap height (easy mode) |
base_gap_height | 8.0 | Final gap height (after ramp) |
easy_mode_pipes | 6 | Pipes before difficulty ramp begins |
ramp_pipes | 12 | Pipes over which difficulty increases |
Observation Format
<FLAPPY id=K>
<OBS>
birdY:By
birdX:Bx
velY:Vy
gapHeight:G
birdRadius:R
pipes:[(x1,gapY1),(x2,gapY2),...]
score:Z
</OBS>
</FLAPPY>
Where:
By: Bird's vertical position (range: [-10, 10])Bx: Bird's horizontal position (fixed at 4.0)Vy: Vertical velocity (positive = up, negative = down)G: Current gap heightR: Bird collision radiuspipes: List of (x, gapY) for visible pipesZ: Current score (pipes passed)
Quickstart
Run with defaults:
uv run vf-eval flappybird
Configure model and game parameters:
uv run vf-eval flappybird \
-m gpt-4.1-mini \
-n 20 -r 3 -t 512 -T 0.7 \
-a '{
"max_turns": 500,
"config_overrides": {
"easy_gap_height": 7.0,
"gravity": -0.25
}
}'
Notes:
- Use
-a/--env-argsfor JSON kwargs forwarded toload_environment() - Reports are written to
./environments/flappybird/reports/
Environment Arguments
| Arg | Type | Default | Description |
|---|---|---|---|
max_turns | int | 300 | Maximum ticks before episode ends |
num_examples | int | 10 | Number of game instances in dataset |
config | FlappyConfig | null | null | Full config object (overrides defaults) |
config_overrides | dict | null | null | Partial overrides merged with defaults |
Config Overrides
All FlappyConfig fields can be overridden:
| Field | Type | Default | Description |
|---|---|---|---|
world_w | float | 24.0 | World width |
world_h | float | 20.0 | World height |
bird_x | float | 4.0 | Bird's fixed X position |
bird_radius | float | 0.20 | Bird collision radius |
gravity | float | -0.30 | Gravity acceleration |
jump_impulse | float | 1.6 | TAP velocity |
pipe_speed | float | 0.30 | Pipe scroll speed |
pipe_spawn_interval | int | 30 | Ticks between pipe spawns |
first_spawn_interval | int | 10 | Ticks before first pipe |
base_gap_height | float | 8.0 | Final gap height |
easy_gap_height | float | 6.0 | Initial gap height |
easy_mode_pipes | int | 6 | Pipes at easy difficulty |
ramp_pipes | int | 12 | Pipes during difficulty ramp |
gap_center_offset | float | 2.0 | Max gap center deviation from center |
Metrics
| Metric | Meaning |
|---|---|
pipes_passed | Number of pipes successfully navigated |
survival_score | Accumulated survival bonus (higher for longer runs) |
done | Episode termination flag |
reward | Weighted combination: 2.0 × pipes + 0.2 × survival + 0.2 × format |
Survival Reward Details
The survival reward accumulates each tick the bird stays alive:
- Base: 1.0 per tick
- Multiplier: 1.0 + 0.005 × min(step, 80)
- Encourages both survival and sustained play
Collision Detection
The bird is treated as a circle with center (Bx, By) and radius R:
- Boundary collision:
By + R ≥ +10orBy - R ≤ -10 - Pipe collision: When bird's X range overlaps pipe's X range AND bird's Y range exceeds the gap
Evaluation Reports
No reports found. Run uv run vf-eval flappybird to generate one.