0

Wikispeedia RL Env (Community)

Fresh

Navigate Wikipedia articles by following hyperlinks to reach a target article

Type
RL Env
License
apache-2.0
Published
Apr 2026

Cite

Notes

Only stored in your browser.

wikispeedia

Source Code

Navigate Wikipedia articles by following hyperlinks to reach a target article. Based on the Wikispeedia game (West, Pineau & Precup, IJCAI 2009): the model is given a source and target article and must find a path by clicking links — only links present in the current article are reachable. Two tools are exposed:

  • click_link(article) — navigate to a linked article. Returns the article text and the outgoing-link menu.
  • go_back() — return to the previous article (undo the last click). Optional — disabled via allow_go_back=false.

Overview

  • Environment ID: wikispeedia
  • Short description: Multi-turn graph navigation over a static Wikipedia subset; reward = reached target.
  • Tags: multi-turn, tool-use, navigation, rl

Datasets

  • Primary dataset: Stanford SNAP Wikispeedia — a condensed Wikipedia with 4,604 articles and ~120K directed hyperlinks (avg out-degree 26). Downloaded on first env access; cached under ~/.cache/wikispeedia.
  • Source links: Wikispeedia at SNAP, paper (IJCAI 2009)
  • Split: a fixed random train/eval partition with disjoint target articles.
    • Articles are deterministically split into a train-target pool (~90%) and an eval-target pool (~10%); no target article ever appears in both splits.
    • train (50,000 pairs): random (source, target) pairs sampled uniformly with target drawn from the train pool, restricted to the dist 3–8 band.
    • eval (1,000 pairs): random pairs with target drawn from the eval pool, same dist band.
    • Sampling is seeded; two runs see the same split. Sizes are baked in (not configurable).

Quickstart

Run a single debug rollout

prime eval wikispeedia -n1 -r1 -d -v

Full eval with prime

prime eval run wikispeedia -n 50 -r 1

Harder difficulty (longer shortest paths)

prime eval run wikispeedia -a '{"min_path_length": 5, "max_path_length": 8}'

Hide article body, expose only the link menu

prime eval run wikispeedia -a '{"links_only": true}'

Environment Arguments

ArgTypeDefaultDescription
max_turnsint50Hard cap on agent turns per rollout.
min_path_lengthint3Minimum shortest-path distance between source and target. Graph supports 1..9; ~78% of pairs sit at dist 3–4. Tightening this filters both train and eval.
max_path_lengthint8Maximum shortest-path distance. Only ~470 pairs at dist=8 and 5 pairs at dist=9 — useful for the hardest tail.
cache_dirstr | nullnullOverride cache directory for the SNAP tarballs. Falls back to ~/.cache/wikispeedia (env var: WIKISPEEDIA_CACHE_DIR).
links_onlyboolfalseIf true, hide article bodies and show only the outgoing-link menu — ablation for whether the agent navigates from semantic content or link names alone.
allow_go_backbooltrueIf false, the go_back tool is not registered (and not mentioned in the system prompt). Every click is permanent — forces planning over backtracking.
train_onlyboolfalseIf true, vf-eval runs on the train split (50K pairs) instead of the eval split (1K). Workaround for evaluating on train without a built-in flag.

The default vf-eval invocation runs the full 1,000-pair eval split with 4 rollouts per example (num_examples=1000, rollouts_per_example=4); these are baked into pyproject.toml.

Metrics

MetricWeightMeaning
reached_target1.01.0 if the agent navigated to the target article, else 0.0. The main reward signal.
path_efficiency0.0shortest_path / actual_steps on successful rollouts (1.0 = optimal). Logged-only.
path_length0.0Number of links clicked. Logged-only.

The reward is identical to reached_target.

Per-pair info

Each dataset row's info includes:

  • source, target, shortest_path
  • human_attempts, human_success_rate, human_avg_rating — aggregated SNAP human-play statistics for the pair. Present only when SNAP has at least one human play for the pair; absent otherwise. With the random split, this is sparse on both train and eval rows.