MetaMedQA

Evaluation environment for the MetaMedQA dataset.

Overview

Environment ID: metamedqa
Short description: Single-turn medical multiple-choice QA drawn from multiple medical exam sources
Tags: medical, single-turn, multiple-choice, eval

Type: single-turn
Rubric overview: Binary scoring (1.0 / 0.0) based on correct letter or answer text match

Run an evaluation with default settings:

prime eval run metamedqa -m "openai/gpt-5-mini" -n 5 -s

Configure model and sampling:

medarc-eval metamedqa -m "openai/gpt-5-mini" -n 20 --shuffle-answers --shuffle-seed 1618

Arg	Type	Default	Description
`split`	str	`"test"`	Dataset split to use
`shuffle_answers`	bool	`False`	Whether to shuffle answer choices
`shuffle_seed`	int \| None	`1618`	Seed for deterministic answer shuffling

Metric	Meaning
`accuracy`	(weight 1.0): 1.0 if parsed letter matches the gold letter, else 0.0

This environment has been put together by:

Aymane Ouraq - (@aymaneo)