Question 1

What is Medqa Followup?

Accepted Answer

Multi-turn robustness evaluation for medical LLMs - tests whether models maintain correct answers when challenged with follow-up interventions

Question 2

What is the current top score on Medqa Followup?

Accepted Answer

The top reported score is 10.8% by GPT-4o, across 2 models reporting (2 from frontier labs).

Question 3

How can a model improve its Medqa Followup score?

Accepted Answer

Tools linked to Medqa Followup on Sophon include Medqa Followup RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.

Question 4

What license is Medqa Followup under?

Accepted Answer

Medqa Followup is available under unknown.

Medqa Followup

Score history