0

BenchBuilder

LMSYS's automated pipeline for distilling high-quality LLM benchmarks from crowdsourced chat data (e.g. Chatbot Arena, WildChat), producing the Arena-Hard-Auto benchmark.

Type
Framework
Publisher
LMArena
Runtime
custom
License
apache-2.0
Size
pipeline + 500-prompt Arena-Hard-Auto benchmark
Published
Nov 2023

Cite

Notes

Only stored in your browser.