Reasoning Token Allocation
Fresh
There is a problem where LLMs spend a lot of reasoning tokens if not capped, getting stuck in loops or over-optimizng. The key of this envornment is for an agent to learn to set a cap on reasoning token spending and achieve near same accuracy on gsm8k dataset
- Type
- RL Env
- Runtime
ORS- License
- unknown
- Published
- Mar 2026
Cite
Notes
Only stored in your browser.