Book Extractor

Overview

Environment ID: versa/book-extractor
Short description: Train a model to accurately extract knowledge from a reference document using strategic tool-based navigation
Tags: tool-use, knowledge-extraction, train, eval

Datasets

Primary dataset: 30 extraction questions embedded in the environment (no external downloads)
Split sizes: 30 train examples across 3 difficulty levels
Source: Distillation of Becoming a Supple Leopard (Kelly Starrett) — mobility coaching reference (~12K chars)

Task

Type: multi-turn tool use
Output format: Plain text responses containing key facts from the source document
Rubric overview:
- key_fact_recall (weight 0.6) — fraction of expected key facts found via case-insensitive substring match, with 60% completeness threshold (no partial credit for shallow answers). Key facts are multi-word phrases requiring source extraction.
- tool_diversity_reward (weight 0.4) — scaled reward for using multiple distinct tools (0→0.25→0.5→1.0 for 0→1→2→3+ tools)

Difficulty Levels

Level	Name	Questions	Max Turns	What It Tests
1	Factual	10	3	Single lookup — one tool call to find a direct answer
2	Structural	14	4	Cross-referencing — read tables, connect concepts across sections
3	Applied	6	6	Multi-step synthesis — diagnose faults, design sessions, explain exceptions

Use level arg to train on a specific tier:

prime rl run configs/lab/book-extractor-l1.toml   # factual only
prime rl run configs/lab/book-extractor-l3.toml   # applied only
prime rl run configs/lab/book-extractor.toml      # all levels

Tools (11)

Tool	Description
`search(query)`	Search for sections matching keywords
`list_sections()`	List all section headers (table of contents)
`read_section(name)`	Read full content of a named section
`read_range(offset, limit)`	Read a character range from the raw document
`find_table(query)`	Find markdown tables matching a query
`find_definition(term)`	Find where a term is defined
`get_context(query, chars)`	Get text surrounding a match
`count_matches(query)`	Count occurrences of a term across sections
`cross_reference(term1, term2)`	Find sections mentioning both terms
`get_document_stats()`	Get document overview statistics
`run_code(code)`	Execute Python with the document as `doc` variable

Environment Arguments

Arg	Type	Default	Description
`level`	int \| None	`None`	Difficulty level (1=factual, 2=structural, 3=applied). None for all

Quickstart

prime env push environments/book_extractor          # push to hub
prime rl run configs/lab/book-extractor.toml         # launch training
prime rl run configs/lab/book-extractor-l1.toml      # factual only
prime rl logs <run-id> -f                            # stream logs

Note: prime eval run requires paid credits. Use prime rl during the free beta.

Metrics

Metric	Meaning
`reward`	Weighted sum: 0.6 × key_fact_recall + 0.4 × tool_diversity_reward
`key_fact_recall`	Fraction of expected key facts found (0.0 if < 60% found, else 0.6–1.0)
`tool_diversity_reward`	0.0/0.25/0.5/1.0 for 0/1/2/3+ distinct tools used
`total_tool_calls`	Number of tool calls per rollout (tracked automatically)