anusha is an RL env contributor.
Cite
Notes
Only stored in your browser.
Attribution
RL environment for asymmetric-info debate with sophistry-decomposed verifier
Single-agent advocacy variant of sophistry-bench for the Prime Intellect Reward Hacking Sprint. Pre-registered hypothesis: training Llama-3.2-1B on...