calendar-scheduling
Procedural StatefulToolEnv for meeting-time negotiation under realistic calendar constraints.
Overview
- Environment ID:
calendar-scheduling - Type: multi-turn tool use (
StatefulToolEnv) - Objective: submit one meeting window that maximizes weighted attendee utility while satisfying hard constraints
- Reward:
0.0if final submission is invalid or missing- otherwise weighted average attendee utility in
[0, 1]
Each task contains:
- attendees with busy calendars,
- required and optional participants,
- timezone offsets,
- hard local-time bounds (for a subset of attendees),
- soft preferences (day, early/late, back-to-back),
- room availability,
- fixed meeting duration.
Attendee importance weights are normalized to sum to 1.0 per task.
Tools
check_attendee_calendar(attendee_id, day_index)view_attendee_constraints(attendee_id)check_proposal(day_index, start_time_utc, duration_minutes, room_id)submit_window(day_index, start_time_utc, duration_minutes, room_id)
All tool responses include remaining_turns. check_proposal is limited by a per-task score-check budget to discourage brute-force probing.
Deterministic Oracle
Generation uses deterministic rejection sampling plus exhaustive search over candidate windows to compute:
optimal_scorebest_proposals- valid candidate count and search diagnostics
This metadata is stored with each example and used for metrics (for example submission-to-optimal ratio).
Quickstart
Install local environment:
prime env install calendar-scheduling
Run a small eval:
prime eval run calendar-scheduling -n 5 -r 1 -m qwen3-30b-i -e configs/endpoints.toml --skip-upload
Run with custom environment args:
prime eval run calendar-scheduling -n 8 -r 1 -m qwen3-30b-i -e configs/endpoints.toml --skip-upload -a '{"difficulty": "hard", "max_turns": 12, "generator_overrides": {"score_check_budget": 3}}'
Standalone TUI
Render a generated problem in terminal:
uv run --project environments/calendar_scheduling calendar-scheduling-tui --difficulty medium --seed 42 --show-oracle
The TUI shows attendees, room lanes, busy blocks, and highlights oracle windows when --show-oracle is enabled.
Environment Arguments
| Arg | Type | Default | Description |
|---|---|---|---|
difficulty | str | "medium" | High-level task preset (easy, medium, hard) |
num_train | int | 512 | Number of generated training tasks |
num_eval | int | 128 | Number of generated evaluation tasks |
num_examples | int | None | None | Backwards-compatible alias: sets both train and eval sizes |
seed | int | 7 | Base deterministic seed |
max_turns | int | None | None | Turn cap per rollout (preset default when unset) |
generator_overrides | dict | None | None | Fine-grained generation overrides for GenerationConfig fields |
Common generator_overrides keys:
attendee_count_rangewindow_days_rangemeeting_duration_choicesscore_check_budgetmin_valid_candidatesmax_valid_ratiomax_random_baseline_scoremax_generation_attempts
Metrics
| Metric | Meaning |
|---|---|
reward | Final reward (submitted valid score or 0.0) |
submission_made | Whether agent called submit_window |
submission_valid | Whether submitted window passed hard constraints |
oracle_optimal_score | Exhaustive best possible score for the task |
submitted_to_optimal_ratio | submitted_score / oracle_optimal_score (clamped) |
optimality_gap | oracle_optimal_score - submitted_score |
score_checks_used | Number of check_proposal calls consumed |
score_checks_remaining | Remaining proposal-check budget |