0

Hanabi

Multi-turn cooperative card game environment where models play Hanabi by making strategic moves based on partial information.

Domain
rl-env
License
unknown
Published
Oct 2025

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 4.0% by o4 Mini - 2 models reporting (2 frontier)

Score history

2
0%25%50%75%100%Jul 24Sep 24Nov 24Jan 25Mar 25GPT-4o-minio4 Mini

Top models

2
HanabiBar chart with 2 bars. Highest value: o4 Mini at 4.
2 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Hanabi?
Multi-turn cooperative card game environment where models play Hanabi by making strategic moves based on partial information.
What is the current top score on Hanabi?
The top reported score is 4.0% by o4 Mini, across 2 models reporting (2 from frontier labs).
How can a model improve its Hanabi score?
Tools linked to Hanabi on Sophon include Hanabi RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Hanabi under?
Hanabi is available under unknown.