0

BlicketTest CausalReasoning

Multi-turn causal reasoning environment where an LLM explores a Blicket-detecting machine to identify which objects activate it under a hidden rule

Domain
rl-env
License
unknown
Published
Feb 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 62.0% by Qwen3 30B A3B Instruct 2507 - 3 models reporting (1 frontier)

Score history

2
45%59%73%86%100%Oct 24Dec 24Feb 25Apr 25Jun 25Claude 3.5 HaikuQwen3 30B A3B Instruct 2507

Top models

3
BlicketTest CausalReasoningBar chart with 3 bars. Highest value: Qwen3 30B A3B Instruct at 73.9.
3 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is BlicketTest CausalReasoning?
Multi-turn causal reasoning environment where an LLM explores a Blicket-detecting machine to identify which objects activate it under a hidden rule
What is the current top score on BlicketTest CausalReasoning?
The top reported score is 62.0% by Qwen3 30B A3B Instruct 2507, across 3 models reporting (1 from frontier labs).
How can a model improve its BlicketTest CausalReasoning score?
Tools linked to BlicketTest CausalReasoning on Sophon include Blickettest Causalreasoning RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is BlicketTest CausalReasoning under?
BlicketTest CausalReasoning is available under unknown.