Medarc is a team.
Cite
Notes
Only stored in your browser.
MedXpertQA is a highly challenging and comprehensive benchmark designed to evaluate expert-level medical knowledge and advanced reasoning capabilit...
Medication QA (MedInfo 2019) – consumer medication question answering benchmark
Single-turn medical MCQ
MedRBench evaluation environment for medical reasoning benchmarks
Med MCQA evaluation environment
Medical hallucination detection benchmark
A realistic virtual EHR environment to benchmark medical LLM agents on clinical tasks.
Med-HALT (Reasoning) evaluation environment for medical LLMs
LongHealth: A Question Answering Benchmark with Long Clinical Documents - 20 patients, 400 MCQ questions
HEAD-QA v2 environment
MetaMedQA medical MCQ evaluation
Multi-agent medical diagnosis environment for evaluating LLMs on clinical diagnosis through interactive conversations.
MedExQA Evaluation - Medical QA with Multiple Explanations
Evaluation environment for the Joshua-Harris/PubHealthBench public health MCQ dataset
Single-turn medicine MCQ
Your environment description here
OpenAI HealthBench evaluation by RK Arora et al., 2025
MedDialog is a benchmark of real-world doctor-patient conversations focused on health-related concerns and advice and tests a model's ability to su...
ACI Bench evaluation environment
Medical Error Detection and Correction in clinical notes from Ben Abacha et al., 2024
SCT-Bench Public Environment
HEAD-QA environment
Evaluation environment for the HPAI-BSC/CareQA MCQ dataset
MedCaseReasoning medical diagnosis evaluation
MTSamples Procedures is a benchmark of medical transcription samples that tests a model's ability to generate coherent and clinically accurate proc...
MTSamples Replicate is a benchmark of transcribed medical reports that evaluates a model’s ability to generate clinically appropriate treatment pla...
MedAgentBench V2 environment for tool-calling evaluation.
MedQA Evaluation
MedCalc-Bench clinical calculator evaluation