0

Shell Agent Bench

Fresh

Virtual terminal debugging environment for Laguna XS.2 agentic RL

Type
RL Env
Runtime
agent
License
unknown
Size
v0.1.6
Published
May 2026

Cite

Notes

Only stored in your browser.

shell-agent-bench

Virtual terminal debugging environment for Laguna XS.2 agentic RL.

Overview

  • Environment ID: shell-agent-bench
  • Task type: multi-turn tool use in a virtual repository
  • Goal: improve terminal-style debugging, file inspection, minimal editing, and test-driven completion

Tooling

The model receives provider-neutral tool definitions:

  • run(command), a safe virtual shell subset for ls, find, cat, sed -n, grep -R, and pytest
  • edit_file(path, old, new), exact replacement patching
  • write_file(path, content), whole-file overwrite fallback
  • finish(summary), final task completion signal

Reward

The optimization reward is binary hidden virtual test success. Extra metrics log partial check fraction, test use, edits, finish calls, and tool errors for analysis without changing the reward.

Quickstart

prime eval run shell-agent-bench -m poolside/laguna-xs.2 -n 4 -r 2 -t 512 -T 0.7

Environment arguments

ArgDefaultMeaning
splittrainTraining split loaded from tasks.jsonl
eval_splitevalEvaluation split loaded from tasks.jsonl
max_examples-1Limit training examples
max_eval_examples-1Limit eval examples
max_turns8Maximum assistant turns