As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To address this gap, we have developed a comprehensive code evaluation dataset FullStack Bench focusing on full-stack programming, which encompasses a wide range of application domains (e.g., basic programming, data analysis, software engineering, mathematics, and machine learning). Besides, to assess multilingual programming capabilities, in FullStack Bench, we design real-world instructions and corresponding unit test cases from 16 widely-used programming languages to reflect real-world usage scenarios rather than simple translations. Moreover, we also release an effective code sandbox execution tool (i.e., SandboxFusion) supporting various programming languages and packages to evaluate the performance of our FullStack Bench efficiently. Comprehensive experimental results on our FullStack Bench demonstrate the necessity and effectiveness of our FullStack Bench and SandboxFusion.
FullStack Bench: Evaluating LLMs as Full Stack Coders
As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing.
- Year
- 2024
- Venue
- arXiv 2024
- Authors
- 55
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2412.00535v6ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
55Zihan WangJun ZhangLi ChenBowen LiHongxia YangHe ZhuQi LiuJie ChenBo LiSiwei WangTao SunJiaheng LiuYifan SunYuyu ZhangBytedance-Seed-Foundation-Code-TeamYao ChengJianfeng ChenLiyu ChenWentao ChenZhengyu ChenShijie GengAoyan LiLinyi LiBoyi LiuKaibo LiuShukai LiuSiyao LiuTianyi LiuTingkai LiuYongfei LiuRui LongJing MaiGuanghan NingZ. Y. PengKai ShenJiahao SuJing SuYunzhe TaoGuoyin WangXuwu WangYite WangJinxiang XiaLiang XiangXia XiaoYongsheng XiaoChenguang XiShulin XinJingjing XuShikun XuJack YangYingxiang YangJianbo YuanYufeng ZhangShen ZhengMing Zhu