0

VeriSciQA

Fresh

VeriSciQA is a visual question answering dataset of 20,351 QA pairs spanning 20 scientific domains and 12 figure types

Type
RL Env
Runtime
ORS
License
unknown
Size
20351 tasks
Published
Feb 2026

Cite

Notes

Only stored in your browser.

VeriSciQA

OpenReward Environment Hugging Face Dataset

Description

VeriSciQA is an environment for evaluating scientific visual question answering. It contains 20,351 multiple-choice questions paired with scientific figures from research papers, spanning 20 scientific domains (Biology, Physics, Chemistry, Computer Science, Mathematics, etc.) and 12 figure types (graphs, diagrams, charts, tables, etc.).

Capabilities

  • Scientific visual question answering
  • Understanding scientific figures from research papers
  • Multiple-choice evaluation across 20 domains and 12 figure types

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC BY-SA 4.0.

Tasks

There is one split in this environment:

  • train: 20,351 tasks

Questions span 20 scientific domains and 12 figure types including line plots, bar charts, scatter plots, diagrams, heatmaps, and composite figures.

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits a single letter answer (A, B, C, or D) via the submit_answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.

Data

JSONL metadata with images (~20,351 JPG files) sourced from HuggingFace datajuicer/VeriSciQA. Stored on the OpenReward platform.

Tools

ToolDescription
submit_answerSubmit a single letter answer (A, B, C, or D). Deterministic evaluation via exact match. Ends the episode.

Time Horizon

Single-turn. The agent reads the question and views the scientific figure, then submits one answer for a total of one tool call.

Environment Difficulty

VeriSciQA evaluates scientific visual reasoning:

Model TypeAccuracy
Best Proprietary Model82%
Open-Source Models~64%

The 18 percentage point gap between proprietary and open-source models demonstrates VeriSciQA's effectiveness as a challenging benchmark for scientific visual reasoning.

Other Environment Requirements

There are no further environment requirements; VeriSciQA works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in VeriSciQA answer scientific visual questions in a standard environment. The environment does not present direct safety risks.

Citation

@article{verisciqa2025,
  title={VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering},
  author={DataJuicer Team},
  journal={arXiv preprint arXiv:2511.19899},
  year={2025}
}