Designing Large Foundation Models for Efficient Training and Inference: A Survey

This survey examines techniques for compressing large language models to reduce their size and computational requirements while maintaining performance, focusing on methods such as quantization, knowledge distillation, and pruning, as well as system-level optimizations.

Open

Preview
Year: 2024
Venue: arXiv 2024
ArXiv: arxiv.org/abs/2409.01990
Authors: 8
Hosting: Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2409.01990v5ARXIV-DEFAULT
TL;DR: Semantic Scholar

Attribution policy →

Abstract

This paper focuses on modern efficient training and inference technologies on foundation models and illustrates them from two perspectives: model and system design. Model and System Design optimize LLM training and inference from different aspects to save computational resources, making LLMs more efficient, affordable, and more accessible. The paper list repository is available at https://github.com/NoakLiu/Efficient-Foundation-Models-Survey.

Authors

Yite Wang Dong Liu Zhongwei Wan Ying Nian Wu Jing Wu Yanxuan Yu Sina Alinejad Benjamin Lengerich