0

Humor Generation RL Env (Community)

Humor Generation RL Env (Community) is an RL environment.

Type
RL Env
License
mit
Published
May 2025

Cite

Notes

Only stored in your browser.

Humor Generation Environment

Overview

A reinforcement learning environment for training language models to generate humor in the style of specific comedians and formats. The environment uses a multi-dimensional scoring rubric to evaluate joke quality across relevance, style consistency, creativity, humor effectiveness, virality, and cognitive coherence.

Features

  • Multi-Comedian Training: Supports various comedian styles (Norm Macdonald, John Mulaney, Hasan Minhaj, Dave Chappelle, Ali Wong, Chris Rock)
  • Format Diversity: Trains on different humor formats (haiku, one-liner, q/a over SMS)
  • Comprehensive Scoring: 6-dimensional evaluation rubric for joke quality assessment
  • Dataset Generation: Automated dataset creation using GPT-4o-mini
  • WandB Integration: Comprehensive experiment tracking and visualization

Environment Structure

  • humor_env.py: Main environment implementation with scoring logic
  • generate_humor_dataset.py: Script for creating training datasets
  • humor_dataset.jsonl: Pre-generated dataset with comedian/format combinations

Scoring Rubric

The environment evaluates generated jokes across six dimensions (0-3 points each):

  1. Relevance to Format (0-2): How well the joke fits the specified format
  2. Style Consistency (0-2): Adherence to the target comedian's style
  3. Creativity (0-3): Originality and inventiveness of the humor
  4. Humor Effectiveness (0-3): How funny and engaging the joke is
  5. Virality (0-3): Potential for widespread appeal and sharing
  6. Cognitive Coherence (0-3): Logical structure and comprehensibility

Usage

Running the Environment

python environments/community/humor_generation/humor_env.py serve

Generating New Datasets

cd environments/community/humor_generation/
python generate_humor_dataset.py

Configuration

  • Model: GPT-4o-mini for both generation and evaluation
  • Group Size: 2 completions per prompt
  • Max Tokens: 2048 for joke generation, 512 for scoring
  • Evaluation: LLM-based scoring using detailed rubric prompts

Requirements

  • OpenAI API key (set as OPENAI_API_KEY environment variable)
  • Standard Atropos dependencies
  • WandB account for experiment tracking

Dataset Format

Each record contains:

  • comedian: Target comedian style
  • format: Humor format (haiku, one-liner, q/a over SMS)
  • question: Prompt asking for model recommendations and example jokes
  • response: GPT-4o-mini generated response with explanations and examples

Training Applications

  • Style Transfer: Learning to mimic specific comedian voices
  • Format Adaptation: Generating humor in constrained formats
  • Quality Assessment: Training models to evaluate humor effectiveness
  • Creative Writing: Developing AI systems for entertainment content creation