0

LegalBench

Frontier

162 collaboratively curated legal-reasoning tasks across rule-recall, issue-spotting, application, and interpretation - the standard legal LLM benchmark.

Domain
legal
Format
HF Dataset
Size
90000 tasks
License
CC-BY-4.0
Published
May 2026
Notable for
Benchmark for evaluating legal reasoning and factual recall in the legal domain.

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 72.0% by GPT-4o-mini - 5 models reporting (5 frontier)

Score history

5
0%25%50%75%100%Jul 24Oct 24Jan 25Apr 25Jul 25GPT-4o-mini

Top models

5
LegalBenchBar chart with 5 bars. Highest value: GPT-4o-mini at 72.
5 models

Where it's ranked

1

Related tools

7
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is LegalBench?
162 collaboratively curated legal-reasoning tasks across rule-recall, issue-spotting, application, and interpretation - the standard legal LLM benchmark.
What capabilities does LegalBench test?
LegalBench evaluates legal reasoning, factual recall.
What is the current top score on LegalBench?
The top reported score is 72.0% by GPT-4o-mini, across 5 models reporting (5 from frontier labs).
How can a model improve its LegalBench score?
Tools linked to LegalBench on Sophon include Legalbench RL Env (Community), Legalbench RL Env (Community), Legalbench RL Env (Prime Intellect), Legalbench RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
What license is LegalBench under?
LegalBench is available under CC-BY-4.0.