0

Discrete MATH RL Env (Camel AI)

Fresh

An environment contains 178 seed questions (53 training, 125 test) in graph discrete math. The seed data are real, human-vetted data collected from networkX library's documentation.

Type
RL Env
Publisher
Camel AI
Runtime
single-turn
License
unknown
Size
v0.1.0
Published
Aug 2025

Cite

Notes

Only stored in your browser.

loong-seed-graph-discrete-math

Replace the placeholders below, then remove this callout. Keep the Evaluation Reports section at the bottom intact so reports can auto-render.

Overview

  • Environment ID: loong-seed-graph-discrete-math
  • Short description: An environment contains 178 seed questions (53 training, 125 test) in graph discrete math. The seed data are real, human-vetted data collected from networkX library's documentation.
  • Tags: reasoning, graph theory
  • Contributors: CAMEL-AI.org

Datasets

Task

  • Type: single-turn
  • Parser: The extractor returns the last occurrence of text following "Final Answer:" (case-insensitive) from the input string, or an empty string if none is found.
  • Rubric overview: The rubric awards a score of 1.0 if the model’s response and the reference answer are equivalent—recursively comparing numbers (with float tolerance), lists, tuples, and dictionaries—and 0.0 otherwise.
CriterionDescriptionReward
Float comparisonIf both response and answer are floats, they are considered correct if they are equal within a small tolerance (1e-6) using math.isclose.1.0 if within tolerance, else 0.0
List/Tuple comparisonBoth must be lists or tuples of the same length, with elements recursively compared.1.0 if all elements match, else 0.0
Dictionary comparisonBoth must be dicts with the same keys; values are recursively compared.1.0 if all values match, else 0.0
String/Other comparisonAll other types are compared directly with ==.1.0 if equal, else 0.0
ParsingBoth response and answer are parsed via ast.literal_eval if possible. If parsing fails, raw string comparison is used.Enables structured comparisons for numbers, lists, tuples, and dicts.
Recursive checkingNested structures are compared recursively, so lists/dicts containing floats or other lists/dicts are handled.Ensures correctness at all levels of the data structure.
Final rewardThe function outputs 1.0 if deep_compare returns True, else 0.0.Numerical reward for model evaluation.

Quickstart

Run an evaluation with default settings:

uv run vf-eval loong-seed-graph-discrete-math

Configure model and sampling:

uv run vf-eval loong-seed-graph-discrete-math   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7   -a '{"key": "value"}'  # env-specific args as JSON

Notes:

  • Use -a / --env-args to pass environment-specific configuration as a JSON object.

Metrics

Summarize key metrics your rubric emits and how they’re interpreted.

MetricMeaning
rewardMain scalar reward (weighted sum of criteria)