Reasoning Token Allocation

Fresh

There is a problem where LLMs spend a lot of reasoning tokens if not capped, getting stuck in loops or over-optimizng. The key of this envornment is for an agent to learn to set a cap on reasoning token spending and achieve near same accuracy on gsm8k dataset

Type: RL Env
Tags: Logical Reasoning Mathematical Reasoning
Runtime: ORS
License: unknown
Published: Mar 2026
Canonical: openreward.ai/sleek-panda/reasoning-token-allocation

Cite

Notes

Only stored in your browser.

Attribution

README: openreward.ai/sleek-panda/reasoning-token-allocation

Attribution policy →

Contributors

Đorđe Ražnatović