BenchBuilder

LMSYS's automated pipeline for distilling high-quality LLM benchmarks from crowdsourced chat data (e.g. Chatbot Arena, WildChat), producing the Arena-Hard-Auto benchmark.

Type: Framework
Publisher: LMArena
Tags: Benchmark Creation
Runtime: custom
License: apache-2.0
Size: pipeline + 500-prompt Arena-Hard-Auto benchmark
Published: Nov 2023
Canonical: github.com/lmarena/arena-hard-auto

Cite

Notes

Only stored in your browser.

Attribution

README: github.com/lmarena/arena-hard-auto/blob/main/README.mdAPACHE-2.0

Attribution policy →

Lift evidence

1

Eval	Tools known to lift	Source paper
Arena-Hard	BenchBuilder	From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Papers

2

introducesFrom Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline