0

Reliable, Reproducible, and Really Fast Leaderboards with Evalica

Evalica is an open-source toolkit designed to create reliable and reproducible leaderboards for instruction-tuned large language models with support for human and machine feedback.

Year
2024
Venue
arXiv 2024
Authors
1
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2412.11314ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

The rapid advancement of natural language processing (NLP) technologies, such as instruction-tuned large language models (LLMs), urges the development of modern evaluation protocols with human and machine feedback. We introduce Evalica, an open-source toolkit that facilitates the creation of reliable and reproducible model leaderboards. This paper presents its design, evaluates its performance, and demonstrates its usability through its Web interface, command-line interface, and Python API.

Authors

1