0

Flappybird RL Env (Antim)

Fresh

Flappybird RL Env (Antim) is an RL environment from Antim.

Type
RL Env
Publisher
Antim
Runtime
multi-turn
License
unknown
Size
v2.0.0
Published
Dec 2025

Cite

Notes

Only stored in your browser.

flappybird

Overview

  • Environment ID: flappybird
  • Short description: Multi-turn Flappy Bird control environment where an LLM decides TAP or NOOP each tick to navigate through pipes.
  • Tags: rl, game, control, verifiers, real-time

A physics-based Flappy Bird simulation where the agent controls a bird navigating through gaps in scrolling pipes. The bird falls due to gravity and must tap to jump, avoiding collisions with pipes and boundaries. Features progressive difficulty ramping from easy initial gaps to tighter challenges.

Datasets

  • Primary dataset: Synthetic game states generated from fresh FlappyGame instances
  • Source: Custom implementation with configurable physics
  • Split sizes: Configurable via num_examples (default 10)

Task

  • Type: Multi-turn real-time control
  • Parser: vf.XMLParser(fields=["think", "actions"], answer_field="actions")
  • Rubric overview:
    • flappy_reward_func (weight 2.0): Number of pipes successfully passed
    • flappy_survival_reward (weight 0.2): Accumulated survival bonus with time-based multiplier
    • Parser format reward (weight 0.2): Enforces <THINK>/<ACTIONS> XML structure

Action Format

The agent outputs one action per tick:

<THINK>Brief plan using physics projection</THINK>
<ACTIONS>[TAP]</ACTIONS>

or for no-op:

<THINK>Safe position, let gravity adjust</THINK>
<ACTIONS>[]</ACTIONS>

Available actions:

  • [TAP]: Set vertical velocity to jump impulse (+1.6)
  • []: No action, gravity applies (-0.3 per tick)

Game Physics

ParameterDefaultDescription
world_w24.0World width
world_h20.0World height (vertical bounds: ±10)
bird_x4.0Fixed horizontal bird position
bird_radius0.20Collision radius
gravity-0.30Downward acceleration per tick
jump_impulse1.6Upward velocity on TAP
pipe_speed0.30Leftward pipe movement per tick
easy_gap_height6.0Initial gap height (easy mode)
base_gap_height8.0Final gap height (after ramp)
easy_mode_pipes6Pipes before difficulty ramp begins
ramp_pipes12Pipes over which difficulty increases

Observation Format

<FLAPPY id=K>
<OBS>
birdY:By
birdX:Bx
velY:Vy
gapHeight:G
birdRadius:R
pipes:[(x1,gapY1),(x2,gapY2),...]
score:Z
</OBS>
</FLAPPY>

Where:

  • By: Bird's vertical position (range: [-10, 10])
  • Bx: Bird's horizontal position (fixed at 4.0)
  • Vy: Vertical velocity (positive = up, negative = down)
  • G: Current gap height
  • R: Bird collision radius
  • pipes: List of (x, gapY) for visible pipes
  • Z: Current score (pipes passed)

Quickstart

Run with defaults:

uv run vf-eval flappybird

Configure model and game parameters:

uv run vf-eval flappybird \
  -m gpt-4.1-mini \
  -n 20 -r 3 -t 512 -T 0.7 \
  -a '{
    "max_turns": 500,
    "config_overrides": {
      "easy_gap_height": 7.0,
      "gravity": -0.25
    }
  }'

Notes:

  • Use -a/--env-args for JSON kwargs forwarded to load_environment()
  • Reports are written to ./environments/flappybird/reports/

Environment Arguments

ArgTypeDefaultDescription
max_turnsint300Maximum ticks before episode ends
num_examplesint10Number of game instances in dataset
configFlappyConfig | nullnullFull config object (overrides defaults)
config_overridesdict | nullnullPartial overrides merged with defaults

Config Overrides

All FlappyConfig fields can be overridden:

FieldTypeDefaultDescription
world_wfloat24.0World width
world_hfloat20.0World height
bird_xfloat4.0Bird's fixed X position
bird_radiusfloat0.20Bird collision radius
gravityfloat-0.30Gravity acceleration
jump_impulsefloat1.6TAP velocity
pipe_speedfloat0.30Pipe scroll speed
pipe_spawn_intervalint30Ticks between pipe spawns
first_spawn_intervalint10Ticks before first pipe
base_gap_heightfloat8.0Final gap height
easy_gap_heightfloat6.0Initial gap height
easy_mode_pipesint6Pipes at easy difficulty
ramp_pipesint12Pipes during difficulty ramp
gap_center_offsetfloat2.0Max gap center deviation from center

Metrics

MetricMeaning
pipes_passedNumber of pipes successfully navigated
survival_scoreAccumulated survival bonus (higher for longer runs)
doneEpisode termination flag
rewardWeighted combination: 2.0 × pipes + 0.2 × survival + 0.2 × format

Survival Reward Details

The survival reward accumulates each tick the bird stays alive:

  • Base: 1.0 per tick
  • Multiplier: 1.0 + 0.005 × min(step, 80)
  • Encourages both survival and sustained play

Collision Detection

The bird is treated as a circle with center (Bx, By) and radius R:

  • Boundary collision: By + R ≥ +10 or By - R ≤ -10
  • Pipe collision: When bird's X range overlaps pipe's X range AND bird's Y range exceeds the gap

Evaluation Reports

No reports found. Run uv run vf-eval flappybird to generate one.