KellyBench
Fresh
KellyBench is a benchmark that tests an agents' ability to make machine learning models for predicting football matches and betting against market odds.
- Type
- RL Env
- Publisher
- General Reasoning
- Runtime
ORS- License
- unknown
- Size
- 6 tasks
- Published
- Jan 2026
Cite
Notes
Only stored in your browser.
Public scores on this env
56 vf-eval reports across 5 models
1GPT-5.4OpenAI920632Claude Opus 4.6Anthropic887713GLM 5Zai483954Gemini 3.1 ProGoogle (Alphabet Inc.)340295Kimi K2.5Moonshot AI10421
Open the scoring view →