Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2.
Baichuan 2: Open Large-scale Language Models
Baichuan 2, a series of 7 billion and 13 billion parameter multilingual LLMs, achieves competitive performance on benchmarks and excels in specialized domains, with open-source pre-training checkpoints released.
- Year
- 2023
- Venue
- arXiv 2023
- Authors
- 55
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2309.10305v4ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
55Ce BianJiaming JiJuntao DaiMickel LiuRuiyang SunXuehai PanTianyu LiChenxu LvFan YangWei ChengHaizhou ZhaoHang XuTao ZhangXin YuFeng WangWeiPeng ChenBin XiaoXiaoxi ChenHui LiuTianpeng LiHaoze SunZenan ZhouBingning WangJian XieBorong ZhangPeidong GuoKun FangMang WangYanjun ShenMingAn LinFeng LiuLiang SongXiangrong ZengYupeng ZhangDa PanGuosheng DongHongda ZhangLei SuXin MenZhiying WuAiyuan YangChao YinDian WangDong YanFei DengGuangwei AiLifeng LiuLiyun RuLuyao MaNuolan NieXiaochuan WangYiding WangYiyu LiYouxin JiangYuchen Gao