0

Medqa Followup RL Env (Community)

Fresh

Multi-turn robustness evaluation for medical LLMs - tests whether models maintain correct answers when challenged with follow-up interventions

Type
RL Env
Runtime
multi-turn
License
unknown
Size
v0.2.3
Published
Dec 2025

Cite

Notes

Only stored in your browser.

Public scores on this env

2

4 vf-eval reports across 2 models

Open the scoring view →

Lift evidence

1

Contributors

1