wikispeedia
Navigate Wikipedia articles by following hyperlinks to reach a target article. Based on the Wikispeedia game (West, Pineau & Precup, IJCAI 2009): the model is given a source and target article and must find a path by clicking links — only links present in the current article are reachable. Two tools are exposed:
click_link(article)— navigate to a linked article. Returns the article text and the outgoing-link menu.go_back()— return to the previous article (undo the last click). Optional — disabled viaallow_go_back=false.
Overview
- Environment ID:
wikispeedia - Short description: Multi-turn graph navigation over a static Wikipedia subset; reward = reached target.
- Tags:
multi-turn,tool-use,navigation,rl
Datasets
- Primary dataset: Stanford SNAP Wikispeedia — a condensed Wikipedia with 4,604 articles and ~120K directed hyperlinks (avg out-degree 26). Downloaded on first env access; cached under
~/.cache/wikispeedia. - Source links: Wikispeedia at SNAP, paper (IJCAI 2009)
- Split: a fixed random train/eval partition with disjoint target articles.
- Articles are deterministically split into a train-target pool (~90%) and an eval-target pool (~10%); no target article ever appears in both splits.
- train (50,000 pairs): random
(source, target)pairs sampled uniformly with target drawn from the train pool, restricted to the dist 3–8 band. - eval (1,000 pairs): random pairs with target drawn from the eval pool, same dist band.
- Sampling is seeded; two runs see the same split. Sizes are baked in (not configurable).
Quickstart
Run a single debug rollout
prime eval wikispeedia -n1 -r1 -d -v
Full eval with prime
prime eval run wikispeedia -n 50 -r 1
Harder difficulty (longer shortest paths)
prime eval run wikispeedia -a '{"min_path_length": 5, "max_path_length": 8}'
Hide article body, expose only the link menu
prime eval run wikispeedia -a '{"links_only": true}'
Environment Arguments
| Arg | Type | Default | Description |
|---|---|---|---|
max_turns | int | 50 | Hard cap on agent turns per rollout. |
min_path_length | int | 3 | Minimum shortest-path distance between source and target. Graph supports 1..9; ~78% of pairs sit at dist 3–4. Tightening this filters both train and eval. |
max_path_length | int | 8 | Maximum shortest-path distance. Only ~470 pairs at dist=8 and 5 pairs at dist=9 — useful for the hardest tail. |
cache_dir | str | null | null | Override cache directory for the SNAP tarballs. Falls back to ~/.cache/wikispeedia (env var: WIKISPEEDIA_CACHE_DIR). |
links_only | bool | false | If true, hide article bodies and show only the outgoing-link menu — ablation for whether the agent navigates from semantic content or link names alone. |
allow_go_back | bool | true | If false, the go_back tool is not registered (and not mentioned in the system prompt). Every click is permanent — forces planning over backtracking. |
train_only | bool | false | If true, vf-eval runs on the train split (50K pairs) instead of the eval split (1K). Workaround for evaluating on train without a built-in flag. |
The default vf-eval invocation runs the full 1,000-pair eval split with 4 rollouts per example (num_examples=1000, rollouts_per_example=4); these are baked into pyproject.toml.
Metrics
| Metric | Weight | Meaning |
|---|---|---|
reached_target | 1.0 | 1.0 if the agent navigated to the target article, else 0.0. The main reward signal. |
path_efficiency | 0.0 | shortest_path / actual_steps on successful rollouts (1.0 = optimal). Logged-only. |
path_length | 0.0 | Number of links clicked. Logged-only. |
The reward is identical to reached_target.
Per-pair info
Each dataset row's info includes:
source,target,shortest_pathhuman_attempts,human_success_rate,human_avg_rating— aggregated SNAP human-play statistics for the pair. Present only when SNAP has at least one human play for the pair; absent otherwise. With the random split, this is sparse on both train and eval rows.