wikispeedia

Navigate Wikipedia articles by following hyperlinks to reach a target article. Based on the Wikispeedia game (West, Pineau & Precup, IJCAI 2009): the model is given a source and target article and must find a path by clicking links — only links present in the current article are reachable. Two tools are exposed:

click_link(article) — navigate to a linked article. Returns the article text and the outgoing-link menu.
go_back() — return to the previous article (undo the last click). Optional — disabled via allow_go_back=false.

Overview

Environment ID: wikispeedia
Short description: Multi-turn graph navigation over a static Wikipedia subset; reward = reached target.
Tags: multi-turn, tool-use, navigation, rl

Datasets

Primary dataset: Stanford SNAP Wikispeedia — a condensed Wikipedia with 4,604 articles and ~120K directed hyperlinks (avg out-degree 26). Downloaded on first env access; cached under ~/.cache/wikispeedia.
Source links: Wikispeedia at SNAP, paper (IJCAI 2009)
Split: a fixed random train/eval partition with disjoint target articles.
- Articles are deterministically split into a train-target pool (~90%) and an eval-target pool (~10%); no target article ever appears in both splits.
- train (50,000 pairs): random (source, target) pairs sampled uniformly with target drawn from the train pool, restricted to the dist 3–8 band.
- eval (1,000 pairs): random pairs with target drawn from the eval pool, same dist band.
- Sampling is seeded; two runs see the same split. Sizes are baked in (not configurable).

Quickstart

Run a single debug rollout

prime eval wikispeedia -n1 -r1 -d -v

Full eval with prime

prime eval run wikispeedia -n 50 -r 1

Harder difficulty (longer shortest paths)

prime eval run wikispeedia -a '{"min_path_length": 5, "max_path_length": 8}'

Hide article body, expose only the link menu

prime eval run wikispeedia -a '{"links_only": true}'

Environment Arguments

Arg	Type	Default	Description
`max_turns`	int	`50`	Hard cap on agent turns per rollout.
`min_path_length`	int	`3`	Minimum shortest-path distance between source and target. Graph supports 1..9; ~78% of pairs sit at dist 3–4. Tightening this filters both train and eval.
`max_path_length`	int	`8`	Maximum shortest-path distance. Only ~470 pairs at dist=8 and 5 pairs at dist=9 — useful for the hardest tail.
`cache_dir`	str \| null	`null`	Override cache directory for the SNAP tarballs. Falls back to `~/.cache/wikispeedia` (env var: `WIKISPEEDIA_CACHE_DIR`).
`links_only`	bool	`false`	If true, hide article bodies and show only the outgoing-link menu — ablation for whether the agent navigates from semantic content or link names alone.
`allow_go_back`	bool	`true`	If false, the `go_back` tool is not registered (and not mentioned in the system prompt). Every click is permanent — forces planning over backtracking.
`train_only`	bool	`false`	If true, `vf-eval` runs on the train split (50K pairs) instead of the eval split (1K). Workaround for evaluating on train without a built-in flag.

The default vf-eval invocation runs the full 1,000-pair eval split with 4 rollouts per example (num_examples=1000, rollouts_per_example=4); these are baked into pyproject.toml.

Metrics

Metric	Weight	Meaning
`reached_target`	1.0	1.0 if the agent navigated to the target article, else 0.0. The main reward signal.
`path_efficiency`	0.0	`shortest_path / actual_steps` on successful rollouts (1.0 = optimal). Logged-only.
`path_length`	0.0	Number of links clicked. Logged-only.

The reward is identical to reached_target.

Per-pair info

Each dataset row's info includes:

source, target, shortest_path
human_attempts, human_success_rate, human_avg_rating — aggregated SNAP human-play statistics for the pair. Present only when SNAP has at least one human play for the pair; absent otherwise. With the random split, this is sparse on both train and eval rows.