0

Query ENV RL Env (Community)

Fresh

Train LLMs to generate correct SQL queries from natural language questions.

Type
RL Env
License
mit
Published
Jan 2026

Cite

Notes

Only stored in your browser.

SQL Query Generation Environment

Train LLMs to generate correct SQL queries from natural language questions.

Overview

This environment uses the Salesforce/WikiSQL dataset to train language models on text-to-SQL tasks. Queries are verified by executing the generated SQL against in-memory SQLite databases and comparing results to ground truth.

Dataset

  • Source: Salesforce/WikiSQL
  • Size: 80,654 examples (train + validation + test)
  • Format: Natural language questions with table schemas and ground truth SQL

Usage

Training Mode (with API Server)

# Terminal 1: Start the Atropos API
run-api

# Terminal 2: Run the environment
python sql_query_env.py serve --slurm False

Local Testing (without API)

python sql_query_env.py process --env.data_path_to_save_groups sql_output.jsonl

This generates sql_output.jsonl and sql_output.html for inspection.

With Local vLLM Server

python sql_query_env.py process \
    --env.data_path_to_save_groups sql_output.jsonl \
    --openai.base_url http://localhost:9001/v1 \
    --openai.model_name YOUR_MODEL_NAME

Reward Function

ScoreCondition
1.0Generated SQL executes and returns same result as gold SQL
-1.0SQL fails to execute or returns incorrect result

When all responses in a group are correct, a length penalty is applied to encourage concise solutions.

Prompt Format

The model receives a table schema and question:

Table: data
Columns: col1, col2, col3
Sample data:
  value1 | value2 | value3

Question: What is the value of col1 where col2 equals X?

Output should be in boxed format:

<think>
[Chain of thought reasoning]
</think>

\boxed{SELECT col1 FROM data WHERE col2 = 'X'}

Unit Tests

# Run unit tests
python -m pytest test_sql_executor.py -v

All 19 tests cover:

  • Table creation with special column names
  • SQL execution and error handling
  • \boxed{} extraction patterns
  • Result comparison and normalization
  • End-to-end scoring integration

LLM Integration Test

The environment has been verified with Qwen3-8B on an NVIDIA H200:

# Run integration test with a local vLLM server
python test_integration.py --base_url http://localhost:8000/v1 --model Qwen/Qwen3-8B

Test results:

  • 40% accuracy on 10 random WikiSQL examples
  • SQL extraction from \boxed{} working correctly
  • Execution-based scoring producing correct reward signals

Files

FileDescription
sql_query_env.pyMain environment implementation
sql_executor.pySQLite execution and scoring utilities
wikisql_loader.pyWikiSQL dataset loader (from GitHub)
test_sql_executor.pyUnit tests (19 tests)
test_integration.pyLLM integration test

Author

Community contribution to Atropos.