0

RealWorldQA

RealWorldQA environment for evaluating vision-language models on real-world question answering

Domain
rl-env
License
unknown
Published
Oct 2025

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 60.0% by Gemini 2.5 Flash - 4 models reporting

Score history

2
35%51%68%84%100%May 25Jun 25Jul 25Aug 25Gemini 2.5 Flash

Top models

4
RealWorldQABar chart with 4 bars. Highest value: Mistral Small 3.2 24B Instruct at 80.
4 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is RealWorldQA?
RealWorldQA environment for evaluating vision-language models on real-world question answering
What is the current top score on RealWorldQA?
The top reported score is 60.0% by Gemini 2.5 Flash, across 4 models reporting.
How can a model improve its RealWorldQA score?
Tools linked to RealWorldQA on Sophon include Realworldqa RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is RealWorldQA under?
RealWorldQA is available under unknown.