LegalBench
Frontier
162 collaboratively curated legal-reasoning tasks across rule-recall, issue-spotting, application, and interpretation - the standard legal LLM benchmark.
- Publisher
- Stanford University
- Capabilities
- Legal ReasoningFactual Recall
- Domain
- legal
- Format
- HF Dataset
- Size
- 90000 tasks
- License
- CC-BY-4.0
- Published
- May 2026
- Notable for
- Benchmark for evaluating legal reasoning and factual recall in the legal domain.
- Canonical
- hazyresearch.stanford.edu/legalbench
Cite
Notes
Only stored in your browser.
Top score 72.0% by GPT-4o-mini - 5 models reporting (5 frontier)
Score history
5Top models
5Where it's ranked
1Related tools
7Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is LegalBench?
- 162 collaboratively curated legal-reasoning tasks across rule-recall, issue-spotting, application, and interpretation - the standard legal LLM benchmark.
- What capabilities does LegalBench test?
- LegalBench evaluates legal reasoning, factual recall.
- What is the current top score on LegalBench?
- The top reported score is 72.0% by GPT-4o-mini, across 5 models reporting (5 from frontier labs).
- How can a model improve its LegalBench score?
- Tools linked to LegalBench on Sophon include Legalbench RL Env (Community), Legalbench RL Env (Community), Legalbench RL Env (Prime Intellect), Legalbench RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is LegalBench under?
- LegalBench is available under CC-BY-4.0.