Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding. It is still challenging to build a code LLM with comprehensive performance yet ultimate efficiency. Many attempts have been released in the open source community to break the trade-off between performance and efficiency, such as the Qwen Coder series and the DeepSeek Coder series. This paper introduces yet another attempt in this area, namely Ling-Coder-Lite. We leverage the efficient Mixture-of-Experts (MoE) architecture along with a set of high-quality data curation methods (especially those based on program analytics) to build an efficient yet powerful code LLM. Ling-Coder-Lite exhibits on-par performance on 12 representative coding benchmarks compared to state-of-the-art models of similar size, such as Qwen2.5-Coder-7B and DeepSeek-Coder-V2-Lite, while offering competitive latency and throughput. In practice, we achieve a 50% reduction in deployment resources compared to the similar-sized dense model without performance loss. To facilitate further research and development in this area, we open-source our models as well as a substantial portion of high-quality data for the annealing and post-training stages. The models and data can be accessed at~\url{https://huggingface.co/inclusionAI/Ling-Coder-lite}.
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 32
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2503.17793ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
32Jiaolong YangJun ZhouZheng LiChaoyu ChenBingchang LiuZi GongHang YuJianguo LiWei zhangZhengyu HeJian WuChen ChenShijie LianPeng DiWenjie YangLing TeamSiba ChenZhenduo ZhangHailin ZhaoJunpeng FangQing CuiTing GuoCong LiCodefuseWenting CaiYuchen CaoYang HuangSongshan LuoShuo MaoMin ShenTong YeXunjin Zheng