0

Twenty Questions

Frontier

Multi-turn game where models try to guess a secret word/object by asking strategic yes/no questions within 20 turns.

Domain
rl-env
License
unknown
Published
Sep 2025

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 1.29 by Qwen3 30B A3B - 13 models reporting (6 frontier)

Score history

13
00.380.751.131.5Jul 24Oct 24Jan 25Apr 25Jul 25GPT-4o-miniDeepSeek R1Qwen3 8BQwen3 30B A3B

Top models

13
Twenty QuestionsBar chart with 13 bars. Highest value: Qwen3 30B A3B at 1.3.
13 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Twenty Questions?
Multi-turn game where models try to guess a secret word/object by asking strategic yes/no questions within 20 turns.
What is the current top score on Twenty Questions?
The top reported score is 1.29 by Qwen3 30B A3B, across 13 models reporting (6 from frontier labs).
How can a model improve its Twenty Questions score?
Tools linked to Twenty Questions on Sophon include Twenty Questions RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Twenty Questions under?
Twenty Questions is available under unknown.