0

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

RecurrentGemma, using Griffin architecture, achieves excellent language performance with efficient memory usage and fewer training tokens compared to Gemma-2B.

Year
2024
Venue
arXiv 2024
Authors
62
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2404.07839v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.

Authors

62