0

Vf Binary Liar

Single-turn number guessing environment with probe tool and noisy hints, designed for calibration and tool-use evaluation.

Domain
rl-env
License
mit
Published
Oct 2025

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 39.21 by GPT-5 Mini - 6 models reporting (2 frontier)

Score history

6
012.52537.550Apr 25Jun 25Aug 25Qwen3 235B A22BGPT-5 Mini

Top models

6
Vf Binary LiarBar chart with 6 bars. Highest value: GPT-5 Mini at 39.2.
6 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Vf Binary Liar?
Single-turn number guessing environment with probe tool and noisy hints, designed for calibration and tool-use evaluation.
What is the current top score on Vf Binary Liar?
The top reported score is 39.21 by GPT-5 Mini, across 6 models reporting (2 from frontier labs).
How can a model improve its Vf Binary Liar score?
Tools linked to Vf Binary Liar on Sophon include Binary LIAR RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Vf Binary Liar under?
Vf Binary Liar is available under mit.