Medqa Followup RL Env (Community)
Fresh
Multi-turn robustness evaluation for medical LLMs - tests whether models maintain correct answers when challenged with follow-up interventions
- Type
- RL Env
- Runtime
multi-turn- License
- unknown
- Size
- v0.2.3
- Published
- Dec 2025
Cite
Notes
Only stored in your browser.
Public scores on this env
24 vf-eval reports across 2 models
Lift evidence
1| Eval | Tools known to lift | Source paper |
|---|---|---|
| MedQA: Medical exam Q&A benchmark | Medqa Followup RL Env (Community) | - |