0

Intern Bootcamp RL Env (Community)

Intern Bootcamp RL Env (Community) is an RL environment.

Type
RL Env
License
mit
Published
May 2025

Cite

Notes

Only stored in your browser.

InternBootcamp RL Training Environment

Overview

The InternBootcamp RL Training Environment is a flexible and extensible framework for training large reasoning models using reinforcement learning on verifiable reasoning tasks. Based on the InternBootcamp library, this environment provides a seamless integration between InternBootcamp's comprehensive collection of reasoning tasks and the Atropos RL training infrastructure.

How InternBootcamp Works

InternBootcamp is a library that provides:

  1. Standardized Task Interface: Each task (called a "bootcamp") implements three core methods:

    • case_generator(): Generates problem instances with controllable difficulty
    • prompt_func(): Converts problem instances into natural language prompts
    • verify_score(): Verifies and scores model responses
  2. Diverse Task Coverage: Over 1,000 verifiable reasoning tasks including:

    • Logic puzzles (e.g., Game24, Sudoku, N-Queens)
    • Mathematical problems (algebra, geometry, calculus)
    • Algorithm challenges (sorting, searching, optimization)
    • Game-based reasoning (chess, Go, strategic games)
    • Pattern recognition and sequence problems
  3. Automatic Task Generation: Tasks can generate unlimited problem instances with:

    • Controllable difficulty parameters
    • Consistent verification methods
    • Scalable complexity

Architecture

InternBootcamp RL Environment
├── Task Selection Layer
│   ├── Single Task Mode (train on one specific bootcamp)
│   ├── Multi-Task Mode (train on multiple bootcamps - TBD)
│   └── Curriculum Mode (progressive difficulty - TBD)
│
├── InternBootcamp Integration
│   ├── Bootcamp Registry (dynamic task discovery)
│   ├── Bootcamp Instance Management
│   ├── Problem Generation Pipeline
│   └── Response Verification System
│
├── RL Training Loop
│   ├── Trajectory Collection
│   ├── Reward Calculation
│   └── Policy Updates
│
└── Atropos Base Environment
    ├── Server Management
    ├── Batch Processing
    └── Wandb Logging

Key Features

1. Dynamic Task Discovery

The environment automatically discovers all available bootcamp tasks (1000+) without manual imports:

from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps

# List all available tasks
tasks = get_available_bootcamps()
print(f"Found {len(tasks)} bootcamp tasks")
# Output: Found 1069 bootcamp tasks

2. Simple Task Selection

Train on any available bootcamp task by name:

# Train on Game24
env = InternBootcampEnv(task_name="Game24bootcamp", task_params={"num_numbers": 4})

# Train on Sudoku
env = InternBootcampEnv(task_name="Sudokubootcamp")

# Train on Maze solving
env = InternBootcampEnv(task_name="Mazebootcamp")

3. Automatic Problem Generation

Each training step:

  1. Instantiates the selected bootcamp with specified parameters
  2. Generates a new problem instance using case_generator()
  3. Converts it to a natural language prompt via prompt_func()
  4. Collects model responses
  5. Verifies correctness using verify_score()

4. Flexible Reward System

  • Base rewards: Correct/incorrect responses (configurable)
  • Format bonuses: Proper answer formatting (e.g., \boxed{} for math)
  • Reasoning bonuses: Quality of step-by-step explanations
  • Task-specific scoring: Each bootcamp can define its own scoring logic

Installation

  1. Clone the repository and navigate to the environment:
cd environments/intern_bootcamp
  1. Install InternBootcamp (already included as a submodule):
cd internbootcamp_lib && uv pip install -e .

Usage Examples

1. Single Task Training

Train on Game24 puzzles with specific difficulty:

python -m environments.intern_bootcamp serve \
    --env--task_name "Game24bootcamp" \
    --env--task_params '{"num_numbers": 4, "range_max": 100}' \
    --env--group_size 8 \
    --env--total_steps 10000

2. Exploring Available Tasks

List all available bootcamp tasks:

from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps

tasks = get_available_bootcamps()
for task in tasks[:20]:  # Show first 20
    print(task)

3. Custom Configuration File

Use a YAML configuration for training:

# config/intern_bootcamp_game24.yaml
env:
  task_name: "Game24bootcamp"
  task_params:
    num_numbers: 4
    range_max: 50
    target_max: 50

  correct_reward: 1.0
  incorrect_reward: -0.5
  format_bonus: 0.2

  group_size: 8
  total_steps: 10000
  steps_per_eval: 100

openai:
  model_name: "gpt-4"
  temperature: 0.7
  max_tokens: 2048

Run with config:

python -m environments.intern_bootcamp serve --config config/intern_bootcamp_game24.yaml

Available Bootcamp Tasks

The environment supports over 1000 bootcamp tasks. Some examples include:

  • Math & Logic: Game24bootcamp, Sudokubootcamp, Kakurobootcamp
  • Algorithms: Mazebootcamp, Slitherlinkbootcamp, Bridgesbootcamp
  • Games: InternGObootcamp, Chessbootcamp
  • Pattern Recognition: Arcbootcamp, Nonogramsbootcamp
  • Code Generation: CodeIObootcamp, BigCodeBenchbootcamp
  • Language Tasks: Cipherbootcamp, WordSortingbootcamp

Use get_available_bootcamps() to see the full list.

Implementation Details

Environment Configuration

class InternBootcampEnvConfig(BaseEnvConfig):
    # Task selection
    task_name: str = "Game24bootcamp"  # Bootcamp task name
    task_params: Dict[str, Any] = {}   # Task-specific parameters

    # Reward configuration
    correct_reward: float = 1.0
    incorrect_reward: float = -0.5
    format_bonus: float = 0.2

    # Training parameters
    require_reasoning: bool = True
    min_reasoning_length: int = 50
    temperature: float = 0.7
    top_p: float = 0.9

Bootcamp Registry

The environment uses a dynamic registry system to discover and manage bootcamp tasks:

from environments.intern_bootcamp.bootcamp_registry import (
    create_bootcamp,
    get_available_bootcamps,
    bootcamp_registry
)

# Create a bootcamp instance
bootcamp = create_bootcamp("Game24bootcamp", num_numbers=4, range_max=50)

# Get information about a bootcamp
info = bootcamp_registry.get_bootcamp_info("Game24bootcamp")
print(info["parameters"])  # Shows accepted parameters

Evaluation and Metrics

The environment tracks comprehensive metrics:

Performance Metrics

  • Task accuracy: Success rate on the specific bootcamp task
  • Format compliance: Rate of properly formatted responses
  • Reasoning quality: Length and coherence of explanations

Training Metrics

  • Reward statistics: Mean, std, min, max rewards
  • Problem diversity: Variety of generated problems
  • Learning progress: Improvement over time

Troubleshooting

Common Issues

  1. Task Not Found

    ValueError: Unknown bootcamp: XYZBootcamp
    

    Solution: Check available tasks with get_available_bootcamps()

  2. Import Errors

    ImportError: No module named 'internbootcamp'
    

    Solution: Install InternBootcamp: cd internbootcamp_lib && pip install -e .

  3. Parameter Errors

    TypeError: __init__() got an unexpected keyword argument
    

    Solution: Check accepted parameters with bootcamp_registry.get_bootcamp_info(task_name)

Future Enhancements

  1. Multi-Task Training: Train on multiple bootcamps simultaneously
  2. Curriculum Learning: Progressive difficulty advancement
  3. Task Composition: Combine multiple bootcamps into complex reasoning chains
  4. Custom Bootcamps: Easy integration of new reasoning tasks

Contributing

To add new features or improvements:

  1. Fork the repository
  2. Create a feature branch
  3. Implement your changes following the existing patterns
  4. Add tests for new functionality
  5. Submit a pull request with a clear description

License

This environment follows the same license as the Atropos framework and InternBootcamp library.