0

PubMedQA: A Dataset for Biomedical Research Question Answering

Active

Biomedical question answering (QA) dataset collected from PubMed abstracts.

Domain
Knowledge
License
mit
Published
May 2026
Notable for
Benchmark for evaluating Knowledge.

Cite

Notes

Only stored in your browser.

Top score 77.5% by Claude Opus 4.5 - 2 models reporting (2 frontier)

Score history

2
0%25%50%75%100%Sep 25Oct 25Nov 25Claude Sonnet 4.5Claude Opus 4.5

Top models

2
PubMedQA: A Dataset for Biomedical Research Question AnsweringBar chart with 2 bars. Highest value: Claude Opus 4.5 at 77.5.
2 models

Related tools

8
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is PubMedQA: A Dataset for Biomedical Research Question Answering?
Biomedical question answering (QA) dataset collected from PubMed abstracts.
What is the current top score on PubMedQA: A Dataset for Biomedical Research Question Answering?
The top reported score is 77.5% by Claude Opus 4.5, across 2 models reporting (2 from frontier labs).
How can a model improve its PubMedQA: A Dataset for Biomedical Research Question Answering score?
Tools linked to PubMedQA: A Dataset for Biomedical Research Question Answering on Sophon include Pubmedqa RL Env (Community), Openmed Pubmedqa RL Env (Community), Pubmedqa RL Env (Medarc), Openmed Pubmedqa RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is PubMedQA: A Dataset for Biomedical Research Question Answering under?
PubMedQA: A Dataset for Biomedical Research Question Answering is available under mit.