Certainty Collapse RL Env (Community)

Fresh

Reward Hacking Sprint: does optimizing self-certainty (RLIF-style intrinsic reward) cause models to be confidently wrong on math? GSM8K, Llama-3.2-...

Type: RL Env
Capabilities: Math
Tags: Rlif Self Certainty Gsm8k Reward Hacking
License: unknown
Size: v0.1.2
Published: May 2026
Canonical: app.primeintellect.ai/dashboard/environments/cardan05/certainty-collapse

Cite

Notes

Only stored in your browser.

Attribution

README: api.primeintellect.ai/api/v1/environmentshub/cardan05/certainty-collapse/@0.1.2/inspect

Attribution policy →

Lift evidence

Eval	Tools known to lift	Source paper
GSM8K	Certainty Collapse RL Env (Community)	-
GSM8K: Grade School Math Word Problems	Certainty Collapse RL Env (Community)	-
Grade School Math 8K	Certainty Collapse RL Env (Community)	-

Contributors

cardan05