0

Click Calibrate

Fresh

Multi-turn visual click calibration tasks with click and computer tool formats across pixel and normalized coordinate schemas.

Type
RL Env
Publisher
Prime
Runtime
multi-turn
License
unknown
Size
v0.1.3
Published
May 2026

Cite

Notes

Only stored in your browser.

click-calibrate

Overview

  • Environment ID: click-calibrate
  • Short description: Multi-turn visual click calibration for models that emit raw pixel or normalized coordinates through either a direct click tool or a computer tool.
  • Tags: visual, click, calibration, eval

Datasets

  • Primary dataset(s): Synthetic PNG screenshots generated at load_environment() time.
  • Task families:
    • aim: a blank screen with one colored circle; click the center of the circle.
    • text: a simple rendered page/document; click a named word.
  • Default size: 80 generated examples.

Task

  • Type: multi-turn visual tool use.
  • Expected action: call the configured tool once per turn until the target is hit or the turn limit is reached.
  • Stop condition: rollout ends on a target hit or after max_turns model turns.
  • Tool formats:
    • click: direct click tool, using click(x_px, y_px) for pixels or click(x, y) for normalized coordinates.
    • computer: computer-use style tool, using computer(actions=[{"action": "left_click", "coordinate": [x, y]}]). The scorer also accepts the native top-level form computer(action="left_click", coordinate=[x, y]).
  • Coordinate modes:
    • pixel: raw image pixels.
    • norm1000: normalized integer coordinates [0, 1000].
    • norm999: normalized integer coordinates [0, 999].
  • Scoring: the emitted click is converted to pixels and checked against the target circle or text bounding box.

Quickstart

Install the environment:

prime env install click-calibrate

Run a small pixel-coordinate eval:

prime eval run click-calibrate -m openai/gpt-4.1-mini -n 5 -r 1 -a '{"coordinate_mode": "pixel"}'

Run normalized-coordinate variants:

prime eval run click-calibrate -m openai/gpt-4.1-mini -n 5 -r 1 -a '{"coordinate_mode": "norm1000"}'
prime eval run click-calibrate -m openai/gpt-4.1-mini -n 5 -r 1 -a '{"coordinate_mode": "norm999"}'

Run the same coordinate modes through the computer tool:

prime eval run click-calibrate -m anthropic/claude-haiku-4.5 -n 5 -r 1 -a '{"tool_format": "computer", "coordinate_mode": "pixel"}'
prime eval run click-calibrate -m anthropic/claude-haiku-4.5 -n 5 -r 1 -a '{"tool_format": "computer", "coordinate_mode": "norm1000"}'

Visualize Saved Runs

Generate a static HTML index for all saved runs under outputs/evals:

uv run python environments/click_calibrate/view_eval.py

Generate a viewer for one saved eval run directory or results.jsonl file:

uv run python environments/click_calibrate/view_eval.py \
  environments/click_calibrate/outputs/evals/<run>/results.jsonl

The viewer overlays the ground-truth target in green and the model's attempted click in red. Use the on-page arrows or keyboard left/right arrows to move between samples.

Environment Arguments

ArgTypeDefaultDescription
coordinate_modestr"pixel"One of pixel, norm1000, or norm999.
tool_formatstr"click"One of click or computer.
dataset_sizeint80Number of synthetic examples to generate.
task_mixstr"aim,text"Comma-separated task families.
seedint0Random seed for deterministic generation.
widthint or nullnullFixed image width. Must be paired with height.
heightint or nullnullFixed image height. Must be paired with width.
tool_namestr or nullnullOptional override for the exposed tool name. Defaults to click for tool_format="click" and computer for tool_format="computer".
text_fallbackboolfalseInclude JSON fallback instructions for non-tool-call endpoints. Leave disabled for click-tool calibration.
sweep_tagstr or nullnullOptional label saved in run metadata and sample info for distinguishing eval sweeps.
max_turnsint10Maximum attempts per rollout.

Metrics

MetricMeaning
click_hitMain reward: 1 if any attempted click lands on the target, else 0.
first_turn_accuracy1 if the first model turn hits the target.
last_turn_accuracy1 if the final model turn hits the target.
click_extracted1 if any click could be parsed from tool calls or text fallback.
coordinate_valid1 if the final emitted click has coordinates in the declared range.
tool_call_used1 if the model used the configured tool instead of text fallback.
distance_pxZero-weight metric: pixel distance from the final attempted click to the target center. Lower is better.
target_click_distance_pxZero-weight alias for distance_px with a more explicit name.

v0.1.0 Results

Corrected sweep over 5 models, 2 coordinate modes, and 5 image sizes with -n 20 -r 5 -s.

Aggregate results:

ModelModeRewardDistance pxClick extractedTool call usedTruncationError
openai/gpt-5.5pixel0.91010.81.0001.0000.0000.000
openai/gpt-5.5norm10000.90810.91.0001.0000.0000.000
anthropic/claude-opus-4.7pixel0.90011.71.0001.0000.0000.000
anthropic/claude-opus-4.7norm10000.89813.71.0001.0000.0000.000
anthropic/claude-haiku-4.5pixel0.82621.11.0001.0000.0000.000
anthropic/claude-haiku-4.5norm10000.71227.81.0001.0000.0000.000
qwen/qwen3.6-35b-a3bpixel0.174234.20.9660.9720.0080.004
qwen/qwen3.6-35b-a3bnorm10000.588186.40.9400.9400.0080.024
moonshotai/kimi-k2.6pixel0.79039.20.9880.8340.0000.000
moonshotai/kimi-k2.6norm10000.74657.20.9940.8420.0000.000

Reward by image size:

ModelMode1024x7681440x9001280x7201000x1000800x600
openai/gpt-5.5pixel0.850.851.000.851.00
openai/gpt-5.5norm10000.840.851.000.851.00
anthropic/claude-opus-4.7pixel0.850.850.950.851.00
anthropic/claude-opus-4.7norm10000.850.850.940.851.00
anthropic/claude-haiku-4.5pixel0.850.450.990.841.00
anthropic/claude-haiku-4.5norm10000.720.220.830.850.94
qwen/qwen3.6-35b-a3bpixel0.090.010.040.710.02
qwen/qwen3.6-35b-a3bnorm10000.600.480.510.690.66
moonshotai/kimi-k2.6pixel0.770.560.940.770.91
moonshotai/kimi-k2.6norm10000.650.680.880.780.74