Dan Alistarh
- Papers
- 36
Cite
Notes
Only stored in your browser.
Authored papers
36Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation
arXiv 2026
MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning
arXiv 2026
MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization
arXiv 2026
DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers
arXiv 2026
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
arXiv 2025
WUSH: Near-Optimal Adaptive Transforms for LLM Quantization
arXiv 2025
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
arXiv 2025
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
arXiv 2025
HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs
arXiv 2025
DarwinLM: Evolutionary Structured Pruning of Large Language Models
arXiv 2025
Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models
arXiv 2025
Efficient Data Selection at Scale via Influence Distillation
arXiv 2025
Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning
arXiv 2024
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
arXiv 2024
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
arXiv 2024
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
arXiv 2024
Panza: Design and Analysis of a Fully-Local Personalized Text Writing Assistant
arXiv 2024
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
arXiv 2024
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search
arXiv 2024
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
arXiv 2024
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
arXiv 2024
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
arXiv 2023
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
arXiv 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
arXiv 2023
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
arXiv 2023
Error Feedback Can Accurately Compress Preconditioners
arXiv 2023
Sparse Fine-tuning for Inference Acceleration of Large Language Models
arXiv 2023
SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks
arXiv 2023
ZipLM: Inference-Aware Structured Pruning of Language Models
ziplm-inference-aware-structured-pruning-of
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
arXiv 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
arXiv 2022
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
arXiv 2022
L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning
arXiv 2022
CrAM: A Compression-Aware Minimizer
arXiv 2022
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
NeurIPS 2021 12
Model compression via distillation and quantization
model-compression-via-distillation-and-1
Affiliations
Frequent co-authors
10from 36 papers