Book Extractor
Overview
- Environment ID:
versa/book-extractor - Short description: Train a model to accurately extract knowledge from a reference document using strategic tool-based navigation
- Tags:
tool-use,knowledge-extraction,train,eval
Datasets
- Primary dataset: 30 extraction questions embedded in the environment (no external downloads)
- Split sizes: 30 train examples across 3 difficulty levels
- Source: Distillation of Becoming a Supple Leopard (Kelly Starrett) — mobility coaching reference (~12K chars)
Task
- Type: multi-turn tool use
- Output format: Plain text responses containing key facts from the source document
- Rubric overview:
key_fact_recall(weight 0.6) — fraction of expected key facts found via case-insensitive substring match, with 60% completeness threshold (no partial credit for shallow answers). Key facts are multi-word phrases requiring source extraction.tool_diversity_reward(weight 0.4) — scaled reward for using multiple distinct tools (0→0.25→0.5→1.0 for 0→1→2→3+ tools)
Difficulty Levels
| Level | Name | Questions | Max Turns | What It Tests |
|---|---|---|---|---|
| 1 | Factual | 10 | 3 | Single lookup — one tool call to find a direct answer |
| 2 | Structural | 14 | 4 | Cross-referencing — read tables, connect concepts across sections |
| 3 | Applied | 6 | 6 | Multi-step synthesis — diagnose faults, design sessions, explain exceptions |
Use level arg to train on a specific tier:
prime rl run configs/lab/book-extractor-l1.toml # factual only
prime rl run configs/lab/book-extractor-l3.toml # applied only
prime rl run configs/lab/book-extractor.toml # all levels
Tools (11)
| Tool | Description |
|---|---|
search(query) | Search for sections matching keywords |
list_sections() | List all section headers (table of contents) |
read_section(name) | Read full content of a named section |
read_range(offset, limit) | Read a character range from the raw document |
find_table(query) | Find markdown tables matching a query |
find_definition(term) | Find where a term is defined |
get_context(query, chars) | Get text surrounding a match |
count_matches(query) | Count occurrences of a term across sections |
cross_reference(term1, term2) | Find sections mentioning both terms |
get_document_stats() | Get document overview statistics |
run_code(code) | Execute Python with the document as doc variable |
Environment Arguments
| Arg | Type | Default | Description |
|---|---|---|---|
level | int | None | None | Difficulty level (1=factual, 2=structural, 3=applied). None for all |
Quickstart
prime env push environments/book_extractor # push to hub
prime rl run configs/lab/book-extractor.toml # launch training
prime rl run configs/lab/book-extractor-l1.toml # factual only
prime rl logs <run-id> -f # stream logs
Note:
prime eval runrequires paid credits. Useprime rlduring the free beta.
Metrics
| Metric | Meaning |
|---|---|
reward | Weighted sum: 0.6 × key_fact_recall + 0.4 × tool_diversity_reward |
key_fact_recall | Fraction of expected key facts found (0.0 if < 60% found, else 0.6–1.0) |
tool_diversity_reward | 0.0/0.25/0.5/1.0 for 0/1/2/3+ distinct tools used |
total_tool_calls | Number of tool calls per rollout (tracked automatically) |