VeriSciQA

Description

VeriSciQA is an environment for evaluating scientific visual question answering. It contains 20,351 multiple-choice questions paired with scientific figures from research papers, spanning 20 scientific domains (Biology, Physics, Chemistry, Computer Science, Mathematics, etc.) and 12 figure types (graphs, diagrams, charts, tables, etc.).

Capabilities

Scientific visual question answering
Understanding scientific figures from research papers
Multiple-choice evaluation across 20 domains and 12 figure types

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC BY-SA 4.0.

Tasks

There is one split in this environment:

train: 20,351 tasks

Questions span 20 scientific domains and 12 figure types including line plots, bar charts, scatter plots, diagrams, heatmaps, and composite figures.

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits a single letter answer (A, B, C, or D) via the submit_answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.

Data

JSONL metadata with images (~20,351 JPG files) sourced from HuggingFace datajuicer/VeriSciQA. Stored on the OpenReward platform.

Tools

Tool	Description
`submit_answer`	Submit a single letter answer (A, B, C, or D). Deterministic evaluation via exact match. Ends the episode.

Time Horizon

Single-turn. The agent reads the question and views the scientific figure, then submits one answer for a total of one tool call.

Environment Difficulty

VeriSciQA evaluates scientific visual reasoning:

Model Type	Accuracy
Best Proprietary Model	82%
Open-Source Models	~64%

The 18 percentage point gap between proprietary and open-source models demonstrates VeriSciQA's effectiveness as a challenging benchmark for scientific visual reasoning.

Other Environment Requirements

There are no further environment requirements; VeriSciQA works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in VeriSciQA answer scientific visual questions in a standard environment. The environment does not present direct safety risks.

Citation

@article{verisciqa2025,
  title={VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering},
  author={DataJuicer Team},
  journal={arXiv preprint arXiv:2511.19899},
  year={2025}
}