The Leaderboard Illusion
A Cohere-led critique arguing Chatbot Arena's private testing, undisclosed model retractions, and data-access asymmetries systematically advantage a handful of large labs.
- Publisher
- Cohere Labs
- Year
- 2025
- Venue
- preprint
- Authors
- 10
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
TL;DR
Semantic Scholar
This work identifies systematic issues that have resulted in a distorted playing field in Chatbot Arena and offers actionable recommendations to reform the Chatbot Arena's evaluation framework and promote fairer, more transparent benchmarking for the field.