0

The Leaderboard Illusion

A Cohere-led critique arguing Chatbot Arena's private testing, undisclosed model retractions, and data-access asymmetries systematically advantage a handful of large labs.

Publisher
Cohere Labs
Year
2025
Venue
preprint
Authors
10
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

TL;DR

Semantic Scholar

This work identifies systematic issues that have resulted in a distorted playing field in Chatbot Arena and offers actionable recommendations to reform the Chatbot Arena's evaluation framework and promote fairer, more transparent benchmarking for the field.

Authors

10