Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.
XGen-7B Technical Report
XGen, a series of open-source 7B parameter models trained on up to 8K sequence lengths, outperforms other open-source LLMs in long sequence modeling tasks.
- Year
- 2023
- Venue
- arXiv 2023
- Authors
- 25
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2309.03450ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
25Caiming XiongSilvio SavareseShafiq JotyJesse VigTian XieErik NijkampHiroaki HayashiYingbo ZhouPhilippe LabanPrafulla Kumar ChoubeyChen XingBo PangSenthil PurushwalkamChien-Sheng WuWojciech KryścińskiRui MengBen KrauseYe LiuSemih YavuzTong NiuCongying XiaLidiya Murakhovs'kaAlex FabbriLifu TuMeghana Bhat