What capabilities does LegalBench test?

LegalBench evaluates legal reasoning, factual recall.

What is the current top score on LegalBench?

The top reported score is 72.0% by GPT-4o-mini, across 5 models reporting (5 from frontier labs).

How can a model improve its LegalBench score?

Tools linked to LegalBench on Sophon include Legalbench RL Env (Community), Legalbench RL Env (Community), Legalbench RL Env (Prime Intellect), Legalbench RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.

What license is LegalBench under?

LegalBench is available under CC-BY-4.0.

LegalBench

Frontier

162 collaboratively curated legal-reasoning tasks across rule-recall, issue-spotting, application, and interpretation - the standard legal LLM benchmark.

Open

Publisher: Stanford University
Capabilities: Legal Reasoning Factual Recall
Domain: legal
Format: HF Dataset
Size: 90000 tasks
License: CC-BY-4.0
Published: May 2026
Notable for: Benchmark for evaluating legal reasoning and factual recall in the legal domain.
Canonical: hazyresearch.stanford.edu/legalbench
Also on: huggingface.co/datasets/nguha/legalbench

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores: prime-hub

Attribution policy →

Top score 72.0% by GPT-4o-mini - 5 models reporting (5 frontier)

Score history

Top models

LegalBenchBar chart with 5 bars. Highest value: GPT-4o-mini at 72.

5 models

Where it's ranked

Vals AI LegalBench

Vals AI

Aggregated

aggregated with 1 other · monthly

Related tools

View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Legalbench RL Env (Community)

LegalBench environment for legal reasoning tasks

ImplementationRL EnvLegalbenchLegalReasoning

Legalbench RL Env (Community)

Prime Intellect Verifiers environment for LegalBench legal reasoning tasks

ImplementationRL Env

Legalbench RL Env (Prime Intellect)

Prime Intellect

LegalBench environment for legal reasoning tasks

ImplementationRL EnvLegalbenchLegalReasoning

Legalbench RL Env (Prime Community)

Prime Community

LegalBench environment for legal reasoning tasks

ImplementationRL EnvLegalbenchLegalReasoning

Legalbench RL Env (Community)

LegalBench environment for legal reasoning tasks

ImplementationRL EnvLegalbenchLegalReasoning

Abercrombie RL Env (Community)

Teach a small model to classify trademarks on the Abercrombie distinctiveness spectrum

Trains towardRL EnvLawTrademarkAbercrombie

Tomagpt RL Env (Community)

Legal reasoning and hearsay classifier

Trains towardRL EnvLawHearsayLegalbench

FAQ

What is LegalBench?: 162 collaboratively curated legal-reasoning tasks across rule-recall, issue-spotting, application, and interpretation - the standard legal LLM benchmark.
What capabilities does LegalBench test?: LegalBench evaluates legal reasoning, factual recall.
What is the current top score on LegalBench?: The top reported score is 72.0% by GPT-4o-mini, across 5 models reporting (5 from frontier labs).
How can a model improve its LegalBench score?: Tools linked to LegalBench on Sophon include Legalbench RL Env (Community), Legalbench RL Env (Community), Legalbench RL Env (Prime Intellect), Legalbench RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
What license is LegalBench under?: LegalBench is available under CC-BY-4.0.