InternBootcamp RL Training Environment

Overview

The InternBootcamp RL Training Environment is a flexible and extensible framework for training large reasoning models using reinforcement learning on verifiable reasoning tasks. Based on the InternBootcamp library, this environment provides a seamless integration between InternBootcamp's comprehensive collection of reasoning tasks and the Atropos RL training infrastructure.

How InternBootcamp Works

InternBootcamp is a library that provides:

Standardized Task Interface: Each task (called a "bootcamp") implements three core methods:
- case_generator(): Generates problem instances with controllable difficulty
- prompt_func(): Converts problem instances into natural language prompts
- verify_score(): Verifies and scores model responses
Diverse Task Coverage: Over 1,000 verifiable reasoning tasks including:
- Logic puzzles (e.g., Game24, Sudoku, N-Queens)
- Mathematical problems (algebra, geometry, calculus)
- Algorithm challenges (sorting, searching, optimization)
- Game-based reasoning (chess, Go, strategic games)
- Pattern recognition and sequence problems
Automatic Task Generation: Tasks can generate unlimited problem instances with:
- Controllable difficulty parameters
- Consistent verification methods
- Scalable complexity

Architecture

InternBootcamp RL Environment
├── Task Selection Layer
│   ├── Single Task Mode (train on one specific bootcamp)
│   ├── Multi-Task Mode (train on multiple bootcamps - TBD)
│   └── Curriculum Mode (progressive difficulty - TBD)
│
├── InternBootcamp Integration
│   ├── Bootcamp Registry (dynamic task discovery)
│   ├── Bootcamp Instance Management
│   ├── Problem Generation Pipeline
│   └── Response Verification System
│
├── RL Training Loop
│   ├── Trajectory Collection
│   ├── Reward Calculation
│   └── Policy Updates
│
└── Atropos Base Environment
    ├── Server Management
    ├── Batch Processing
    └── Wandb Logging

Key Features

1. Dynamic Task Discovery

The environment automatically discovers all available bootcamp tasks (1000+) without manual imports:

from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps

# List all available tasks
tasks = get_available_bootcamps()
print(f"Found {len(tasks)} bootcamp tasks")
# Output: Found 1069 bootcamp tasks

2. Simple Task Selection

Train on any available bootcamp task by name:

# Train on Game24
env = InternBootcampEnv(task_name="Game24bootcamp", task_params={"num_numbers": 4})

# Train on Sudoku
env = InternBootcampEnv(task_name="Sudokubootcamp")

# Train on Maze solving
env = InternBootcampEnv(task_name="Mazebootcamp")

3. Automatic Problem Generation

Each training step:

Instantiates the selected bootcamp with specified parameters
Generates a new problem instance using case_generator()
Converts it to a natural language prompt via prompt_func()
Collects model responses
Verifies correctness using verify_score()

4. Flexible Reward System

Base rewards: Correct/incorrect responses (configurable)
Format bonuses: Proper answer formatting (e.g., \boxed{} for math)
Reasoning bonuses: Quality of step-by-step explanations
Task-specific scoring: Each bootcamp can define its own scoring logic

Installation

Clone the repository and navigate to the environment:

cd environments/intern_bootcamp

Install InternBootcamp (already included as a submodule):

cd internbootcamp_lib && uv pip install -e .

Usage Examples

1. Single Task Training

Train on Game24 puzzles with specific difficulty:

python -m environments.intern_bootcamp serve \
    --env--task_name "Game24bootcamp" \
    --env--task_params '{"num_numbers": 4, "range_max": 100}' \
    --env--group_size 8 \
    --env--total_steps 10000

2. Exploring Available Tasks

List all available bootcamp tasks:

from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps

tasks = get_available_bootcamps()
for task in tasks[:20]:  # Show first 20
    print(task)

3. Custom Configuration File

Use a YAML configuration for training:

# config/intern_bootcamp_game24.yaml
env:
  task_name: "Game24bootcamp"
  task_params:
    num_numbers: 4
    range_max: 50
    target_max: 50

  correct_reward: 1.0
  incorrect_reward: -0.5
  format_bonus: 0.2

  group_size: 8
  total_steps: 10000
  steps_per_eval: 100

openai:
  model_name: "gpt-4"
  temperature: 0.7
  max_tokens: 2048

Run with config:

python -m environments.intern_bootcamp serve --config config/intern_bootcamp_game24.yaml

Available Bootcamp Tasks

The environment supports over 1000 bootcamp tasks. Some examples include:

Math & Logic: Game24bootcamp, Sudokubootcamp, Kakurobootcamp
Algorithms: Mazebootcamp, Slitherlinkbootcamp, Bridgesbootcamp
Games: InternGObootcamp, Chessbootcamp
Pattern Recognition: Arcbootcamp, Nonogramsbootcamp
Code Generation: CodeIObootcamp, BigCodeBenchbootcamp
Language Tasks: Cipherbootcamp, WordSortingbootcamp

Use get_available_bootcamps() to see the full list.

Implementation Details

Environment Configuration

class InternBootcampEnvConfig(BaseEnvConfig):
    # Task selection
    task_name: str = "Game24bootcamp"  # Bootcamp task name
    task_params: Dict[str, Any] = {}   # Task-specific parameters

    # Reward configuration
    correct_reward: float = 1.0
    incorrect_reward: float = -0.5
    format_bonus: float = 0.2

    # Training parameters
    require_reasoning: bool = True
    min_reasoning_length: int = 50
    temperature: float = 0.7
    top_p: float = 0.9

Bootcamp Registry

The environment uses a dynamic registry system to discover and manage bootcamp tasks:

from environments.intern_bootcamp.bootcamp_registry import (
    create_bootcamp,
    get_available_bootcamps,
    bootcamp_registry
)

# Create a bootcamp instance
bootcamp = create_bootcamp("Game24bootcamp", num_numbers=4, range_max=50)

# Get information about a bootcamp
info = bootcamp_registry.get_bootcamp_info("Game24bootcamp")
print(info["parameters"])  # Shows accepted parameters

Evaluation and Metrics

The environment tracks comprehensive metrics:

Performance Metrics

Task accuracy: Success rate on the specific bootcamp task
Format compliance: Rate of properly formatted responses
Reasoning quality: Length and coherence of explanations

Training Metrics

Reward statistics: Mean, std, min, max rewards
Problem diversity: Variety of generated problems
Learning progress: Improvement over time

Troubleshooting

Common Issues

Task Not Found
```
ValueError: Unknown bootcamp: XYZBootcamp
```
Solution: Check available tasks with get_available_bootcamps()
Import Errors
```
ImportError: No module named 'internbootcamp'
```
Solution: Install InternBootcamp: cd internbootcamp_lib && pip install -e .
Parameter Errors
```
TypeError: __init__() got an unexpected keyword argument
```
Solution: Check accepted parameters with bootcamp_registry.get_bootcamp_info(task_name)

Future Enhancements

Multi-Task Training: Train on multiple bootcamps simultaneously
Curriculum Learning: Progressive difficulty advancement
Task Composition: Combine multiple bootcamps into complex reasoning chains
Custom Bootcamps: Easy integration of new reasoning tasks

Contributing

To add new features or improvements:

Fork the repository
Create a feature branch
Implement your changes following the existing patterns
Add tests for new functionality
Submit a pull request with a clear description

License

This environment follows the same license as the Atropos framework and InternBootcamp library.