BenchBuilder
LMSYS's automated pipeline for distilling high-quality LLM benchmarks from crowdsourced chat data (e.g. Chatbot Arena, WildChat), producing the Arena-Hard-Auto benchmark.
- Type
- Framework
- Publisher
- LMArena
- Runtime
custom- License
- apache-2.0
- Size
- pipeline + 500-prompt Arena-Hard-Auto benchmark
- Published
- Nov 2023
- Canonical
- github.com/lmarena/arena-hard-auto
Cite
Notes
Only stored in your browser.
Attribution
- README
- github.com/lmarena/arena-hard-auto/blob/main/README.mdAPACHE-2.0
Lift evidence
1| Eval | Tools known to lift | Source paper |
|---|---|---|
| Arena-Hard | BenchBuilder | From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline |