Elias Frantar
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
arXiv 2024
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
arXiv 2023
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
arXiv 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
arXiv 2023
Sparse Fine-tuning for Inference Acceleration of Large Language Models
arXiv 2023
Error Feedback Can Accurately Compress Preconditioners
arXiv 2023
ZipLM: Inference-Aware Structured Pruning of Language Models
ziplm-inference-aware-structured-pruning-of
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
arXiv 2023
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
arXiv 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
arXiv 2022
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
arXiv 2022
L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning
arXiv 2022
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
NeurIPS 2021 12
Affiliations
Frequent co-authors
10from 13 papers