0

Reasoning Token Allocation

Fresh

There is a problem where LLMs spend a lot of reasoning tokens if not capped, getting stuck in loops or over-optimizng. The key of this envornment is for an agent to learn to set a cap on reasoning token spending and achieve near same accuracy on gsm8k dataset

Type
RL Env
Runtime
ORS
License
unknown
Published
Mar 2026

Cite

Notes

Only stored in your browser.