loong-seed-graph-discrete-math

Replace the placeholders below, then remove this callout. Keep the Evaluation Reports section at the bottom intact so reports can auto-render.

Environment ID: loong-seed-graph-discrete-math
Short description: An environment contains 178 seed questions (53 training, 125 test) in graph discrete math. The seed data are real, human-vetted data collected from networkX library's documentation.
Tags: reasoning, graph theory
Contributors: CAMEL-AI.org

Type: single-turn
Parser: The extractor returns the last occurrence of text following "Final Answer:" (case-insensitive) from the input string, or an empty string if none is found.
Rubric overview: The rubric awards a score of 1.0 if the model’s response and the reference answer are equivalent—recursively comparing numbers (with float tolerance), lists, tuples, and dictionaries—and 0.0 otherwise.

Criterion	Description	Reward
Float comparison	If both response and answer are floats, they are considered correct if they are equal within a small tolerance (`1e-6`) using `math.isclose`.	1.0 if within tolerance, else 0.0
List/Tuple comparison	Both must be lists or tuples of the same length, with elements recursively compared.	1.0 if all elements match, else 0.0
Dictionary comparison	Both must be dicts with the same keys; values are recursively compared.	1.0 if all values match, else 0.0
String/Other comparison	All other types are compared directly with `==`.	1.0 if equal, else 0.0
Parsing	Both response and answer are parsed via `ast.literal_eval` if possible. If parsing fails, raw string comparison is used.	Enables structured comparisons for numbers, lists, tuples, and dicts.
Recursive checking	Nested structures are compared recursively, so lists/dicts containing floats or other lists/dicts are handled.	Ensures correctness at all levels of the data structure.
Final reward	The function outputs `1.0` if `deep_compare` returns `True`, else `0.0`.	Numerical reward for model evaluation.

Run an evaluation with default settings:

uv run vf-eval loong-seed-graph-discrete-math

Configure model and sampling:

uv run vf-eval loong-seed-graph-discrete-math   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7   -a '{"key": "value"}'  # env-specific args as JSON

Notes:

Use -a / --env-args to pass environment-specific configuration as a JSON object.

Summarize key metrics your rubric emits and how they’re interpreted.

Metric	Meaning
`reward`	Main scalar reward (weighted sum of criteria)