LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

UC Berkeley benchmark that continuously scrapes new LeetCode/AtCoder/CodeForces problems to give a contamination-free, time-stamped coding leaderboard.

Open

Publisher: University of California, Berkeley
Year: 2024
Venue: NeurIPS
ArXiv: arxiv.org/abs/2403.07974
Code: github.com/LiveCodeBench/LiveCodeBench
Authors: 10
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2403.07974
TL;DR: semanticscholar.org/paper/afe0998d191f3ea8490c7df100a3ffc5dcc62c5e
Code: github.com/LiveCodeBench/LiveCodeBench

Attribution policy →

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

This work proposes LiveCodeBench, a comprehensive and contamination-free evaluation of LLMs for code, which continuously collects new problems over time from contests across three competition platforms, namely LeetCode, AtCoder, and CodeForces.

Artifacts

Evals

LiveCodeBench

Authors

Alex Gu Armando Solar-Lezama Fanjia Yan Ion Stoica King Han Koushik Sen Naman Jain Sida Wang Tianjun Zhang Wen-Ding Li