0

GeneralReasoner

Fresh

Implementation of https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified

Type
RL Env
Runtime
ORS
License
unknown
Size
229736 tasks
Published
Jan 2026

Cite

Notes

Only stored in your browser.

GeneralReasoner

OpenReward Environment

Description

GeneralReasoner is an environment for evaluating general reasoning capabilities using the WebInstruct-verified dataset from the General-Reasoner project by TIGER-AI-Lab. It provides diverse reasoning tasks spanning multiple categories and difficulty levels, with LLM-based semantic grading for flexible answer evaluation.

Capabilities

  • General reasoning evaluation across multiple domains
  • Multi-category question answering
  • Semantic answer verification
  • Varied difficulty levels

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There are two splits in this environment:

  • train: 228,736 tasks
  • test: 1,000 tasks

Tasks span multiple categories with varying difficulty levels.

Reward Structure

This is a single-turn environment. The agent submits an answer via the answer tool. An LLM grader (gpt-5-mini) evaluates semantic correctness against the reference answer. Reward is binary: 1.0 if correct, 0.0 if incorrect.

Data

Data consists of Parquet files sourced from the WebInstruct-verified dataset. Each row contains a question, answer, answer type, category, and difficulty level. Data is stored on the OpenReward platform.

Tools

ToolDescription
answerSubmit your answer for LLM grading. Ends the episode.

Time Horizon

Single-turn. The agent reads the question and submits one answer.

Environment Difficulty

[Put environment difficulty statistics here]

Other Environment Requirements

OpenAI API key required for LLM-based grading. Pass via secrets={"openai_api_key": "..."}.

Safety

Agents in GeneralReasoner answer reasoning questions in a standard environment. The environment does not present direct safety risks.

Citation

@inproceedings{ma2025generalreasoner,
  title={General-Reasoner: Advancing {LLM} Reasoning Across All Domains},
  author={Ma, Xueguang and Liu, Qian and Jiang, Dongfu and Zhang, Ge and Ma, Zejun and Chen, Wenhu},
  booktitle={Proceedings of the Neural Information Processing Systems (NeurIPS)},
  year={2025}
}