0

Winning Probability RL Env (Community)

This environment is designed to analyze and predict winning probabilities in various solitaire-style games using both theoretical mathematical analysis and empirical simulation.

Type
RL Env
License
mit
Published
May 2025

Cite

Notes

Only stored in your browser.

Solitaire Winning Probability Environment

This environment is designed to analyze and predict winning probabilities in various solitaire-style games using both theoretical mathematical analysis and empirical simulation.

Overview

The system combines two approaches to determine game winning probabilities:

  1. Theoretical Analysis: Uses AI to derive mathematical formulas for exact probability calculations
  2. Empirical Simulation: Runs Monte Carlo simulations to verify theoretical predictions

Key Components

GamePredictor Class

The core component that handles:

  • AI-powered probability analysis
  • Mathematical formula evaluation
  • Game simulation
  • Probability comparison between theoretical and empirical results

Features

  • AI Analysis: Uses LLM to analyze game mechanics and derive mathematical formulas
  • Formula Evaluation: Supports complex mathematical expressions including:
    • Factorials
    • Combinations (C(n,r))
    • Permutations (P(n,r))
    • Standard mathematical operations
  • Simulation Engine: Runs multiple game simulations to verify theoretical predictions
  • QA Dataset Generation: Creates training data for AI models by generating question-answer pairs

Reward Function

The environment implements a sophisticated reward function that evaluates the quality of probability predictions:

  1. Base Reward Calculation:

    • Compares the predicted probability with the ground truth probability
    • Calculates the relative error: 1 - min(abs(gt - predicted) / gt, 2)
    • Adds a small bonus of 0.2 for valid predictions
    • Clips the final reward between -1 and 1
  2. Length Penalty:

    • Applies a length-based penalty for responses that exceed 50% of the maximum token length
    • No penalty for responses under the threshold
    • Linear scaling of penalty based on response length
    • Helps encourage concise and efficient solutions
  3. Validation Checks:

    • Verifies proper formula formatting and syntax
    • Ensures responses contain valid mathematical expressions
    • Handles edge cases and invalid responses gracefully
  4. Quality Metrics:

    • Tracks percentage of correct predictions
    • Monitors response lengths and quality
    • Provides feedback for model improvement

Usage

# Initialize the predictor
predictor = GamePredictor(openai_api_key, openai_api_base)

# Define games to analyze
games = {
    'game_name': game_function,
    # ... more games
}

# Get predictions for all games
results = await predictor.predict_games(games)

# Generate QA dataset
await predictor.generate_qa_csv(games, n_simulations, "output.csv")

Output Format

The system provides comprehensive analysis for each game:

  • AI's mathematical reasoning
  • Derived probability formula
  • Calculated theoretical probability
  • Simulated empirical probability
  • Comparison assessment between theory and simulation

Supported Games

The environment includes several example games:

  • Easy games (1-4)
  • Card matching games (2-4 cards)
  • Odd card game

Requirements

  • Python 3.x
  • OpenAI API access
  • Required packages:
    • openai
    • asteval
    • asyncio

Purpose

This environment serves multiple purposes:

  1. Educational: Demonstrates probability theory in practical game scenarios
  2. Research: Provides a framework for analyzing game mechanics
  3. AI Training: Generates datasets for training AI models in probability analysis
  4. Verification: Validates theoretical probability calculations through simulation

Contributing

New games can be added by implementing game functions that return a boolean indicating win/loss. The system will automatically analyze and provide probability predictions for any valid game implementation.