0

Curvebench Hard Env

Frontier

CurveBench-Hard: Vision-language model evaluation for hierarchical tree structure extraction from complex images

Domain
rl-env
License
mit
Published
Feb 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 22.8% by Gemini 3.1 Pro Preview - 15 models reporting (5 frontier)

Score history

11
0%25%50%75%100%Aug 25Oct 25Dec 25Feb 26GPT-5 MiniQwen3 VL 235B A22B ThinkingGemini 3 Pro PreviewGemini 3.1 Pro Preview

Top models

15
Curvebench Hard EnvBar chart with 15 bars. Highest value: Gemini 3.1 Pro Preview at 22.8.
15 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Curvebench Hard Env?
CurveBench-Hard: Vision-language model evaluation for hierarchical tree structure extraction from complex images
What is the current top score on Curvebench Hard Env?
The top reported score is 22.8% by Gemini 3.1 Pro Preview, across 15 models reporting (5 from frontier labs).
How can a model improve its Curvebench Hard Env score?
Tools linked to Curvebench Hard Env on Sophon include HARD ENV RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Curvebench Hard Env under?
Curvebench Hard Env is available under mit.