KellyBench

Fresh

KellyBench is a benchmark that tests an agents' ability to make machine learning models for predicting football matches and betting against market odds.

Type: RL Env
Publisher: General Reasoning
Tags: AI Research Tasks
Runtime: ORS
License: unknown
Size: 6 tasks
Published: Jan 2026
Canonical: openreward.ai/GeneralReasoning/KellyBench

Cite

Notes

Only stored in your browser.

Attribution

README: openreward.ai/GeneralReasoning/KellyBench
Scores: OpenReward

Attribution policy →

Public scores on this env

6 vf-eval reports across 5 models

1GPT-5.4OpenAI92063 2Claude Opus 4.6Anthropic88771 3GLM 5Zai48395 4Gemini 3.1 ProGoogle (Alphabet Inc.)34029 5Kimi K2.5Moonshot AI10421

Open the scoring view →