0

Openenv Kernrl RL Env (Community)

Fresh

RL environment for GPU kernel optimization. Train LLM agents to write fast CUDA/Triton kernels.

Type
RL Env
License
bsd-3-clause
Published
Jan 2026

Cite

Notes

Only stored in your browser.

kernrl

RL environment for GPU kernel optimization. Train LLM agents to write fast CUDA/Triton kernels.

Overview

Agents receive a PyTorch reference implementation and must write an optimized GPU kernel that:

  1. Produces the same output (within tolerance)
  2. Runs faster than the baseline

Each submission is evaluated with:

  • Compilation checking
  • Correctness verification against reference
  • Benchmark timing for speedup measurement
  • NSight Systems profiling (optional)
  • NSight Compute profiling (optional)

Installation

cd envs/kernrl
pip install -e .

Requires: NVIDIA GPU with CUDA toolkit, PyTorch, Triton

Quick Start

from kernrl import KernelAction, kernrl_env

# Connect to server
env = kernrl_env(base_url="http://localhost:8000")

# Start episode
obs = env.reset(problem_id="L1_23_Softmax")
print(obs.problem_description)

# Submit a kernel
action = KernelAction(code='''
import torch
import triton
import triton.language as tl

@triton.jit
def softmax_kernel(input_ptr, output_ptr, n_cols, BLOCK_SIZE: tl.constexpr):
    row_idx = tl.program_id(0)
    col_offsets = tl.arange(0, BLOCK_SIZE)
    mask = col_offsets < n_cols

    row_start = row_idx * n_cols
    row = tl.load(input_ptr + row_start + col_offsets, mask=mask, other=-float('inf'))

    row_max = tl.max(row, axis=0)
    row = row - row_max
    numerator = tl.exp(row)
    denominator = tl.sum(numerator, axis=0)
    softmax_output = numerator / denominator

    tl.store(output_ptr + row_start + col_offsets, softmax_output, mask=mask)

class Model(torch.nn.Module):
    def forward(self, x):
        n_rows, n_cols = x.shape
        output = torch.empty_like(x)
        BLOCK_SIZE = triton.next_power_of_2(n_cols)
        softmax_kernel[(n_rows,)](x, output, n_cols, BLOCK_SIZE=BLOCK_SIZE)
        return output
''')

result = env.step(action)
print(f"Speedup: {result.observation.speedup}x")
print(f"Correct: {result.observation.correctness_pass}")

Running the Server

# Development
uvicorn kernrl.server.app:app --reload --host 0.0.0.0 --port 8000

# Docker (GPU required)
cd envs/kernrl
docker build -t kernrl -f server/Dockerfile .
docker run --gpus all -p 8000:8000 kernrl

Problem Levels

LevelNameCountDescription
1Simple Operators15matmul, softmax, conv, norms
2Fused Operations15matmul+activation chains
3Single Blocks3attention, transformer block
4Novel Layers8MLA, MoE, GQA, FP8, INT4
5Scientific Computing8N-body, stencil, SpMV
6Graphics8ray tracing, histogram, blur
7Signal Processing8FFT, convolution, median filter
8Video Processing8motion estimation, optical flow
9Parallel Primitives8scan, reduction, radix sort
10Cryptography8SHA-256, AES, ChaCha20

Total: 89 problems

Reward Structure

Rewards are designed so that only speedup > 1.0x baseline produces positive reward. Compilation and correctness alone do not give positive reward - they are necessary but not sufficient.

ConditionRewardDescription
Compilation failure-0.5Penalty for code that doesn't compile
Correctness failure-0.25Penalty for incorrect output
Correct but slower(speedup - 1.0) * 0.5Small negative for being slower than baseline
Correct and fastermin(speedup - 1.0, 2.0)Positive, capped at 2.0

Examples:

  • Compile fail: reward = -0.5
  • Compiles, wrong output: reward = -0.25
  • Compiles, correct, 0.8x speed: reward = -0.1
  • Compiles, correct, 1.0x speed: reward = 0.0
  • Compiles, correct, 1.5x speed: reward = 0.5
  • Compiles, correct, 3.0x speed: reward = 2.0 (capped)

Security Considerations

Warning: This environment executes user-submitted kernel code with full Python/CUDA privileges. While Docker provides container isolation, there is no sandboxing within the container for:

  • Filesystem access
  • Network requests
  • Resource consumption (GPU memory, CPU)
  • Module imports

This is acceptable for trusted research environments but should be documented as a security consideration. For production deployments, consider additional isolation measures.

License

BSD-3-Clause (following OpenEnv licensing)