We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
RecurrentGemma, using Griffin architecture, achieves excellent language performance with efficient memory usage and fewer training tokens compared to Gemma-2B.
- Year
- 2024
- Venue
- arXiv 2024
- Authors
- 62
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2404.07839v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
62David BuddenDemis HassabisKoray KavukcuogluSebastian BorgeaudAdam PaszkeJohan FerretShreya PathakMorgane RiviereThomas MesnardAlek AndreevCassidy HardinRobert DadashiLéonard HussenotArmand JoulinOlivier BachemClement FarabetTris WarkentinNoah FiedelSoham DeSamuel L. SmithKathleen KenealyYutian ChenYee Whye TehArnaud DoucetMeg RisdalTrevor GaleAleksandar BotevAnushan FernandoGeorge-Cristian MuraruRuba HarounLeonard BerradaRazvan PascanuPier Giuseppe SessaSertan GirginSurya BhupatirajuLaurent SIfreMihir Sanjay KaleJuliette LovePouya TaftiEvan SenterSrivatsan SrinivasanGuillaume DesjardinsSharad VikramCharlie ChenAndy BrockAntonia PatersonJenny BrennanRaj GundluruNesh DevanathanPaul MooneyNilay ChauhanPhil CullitonLuiz GUStavo MartinsElisa BandyDavid HuntspergerGlenn CameronArthur ZuckerLudovic PeranMinh GiangZoubin GhahramaniRaia HadsellNando de Frietas