semih

semih is an RL env contributor.

Cite

Notes

Only stored in your browser.

Attribution

3tool contribs

Tool contributions

Reward-hacking sprint environment: arithmetic tasks with planted sycophancy proxy

RL environment for KV-cache eviction policy optimization in LLM serving

UQ: Assessing Language Models on Unsolved Questions from Stack Exchange

No known affiliations.