Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

The methodology paper for Chatbot Arena, which collects crowd pairwise preference votes on anonymized side-by-side LLM responses and aggregates them via Bradley-Terry into Elo-style rankings.

Open

Publisher: LMArena
Year: 2024
Venue: ICML
ArXiv: arxiv.org/abs/2403.04132
Code: github.com/lm-sys/FastChat
Authors: 11
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2403.04132
TL;DR: semanticscholar.org/paper/53f4fb0e9972989194368faf288ff8e3cba5bd60
Code: github.com/lm-sys/FastChat

Attribution policy →

TL;DR

Semantic Scholar

This paper describes the Chatbot Arena platform, analyzes the data collected so far, and explains the tried-and-true statistical methods used for efficient and accurate evaluation and ranking of models, to establish a robust foundation for the credibility of Chatbot Arena.

Authors

Anastasios N. Angelopoulos Banghua Zhu Dacheng Li Hao Zhang Ion Stoica Joseph E. Gonzalez Lianmin Zheng Michael I. Jordan Tianle Li Wei-Lin Chiang Ying Sheng