0

Dan Alistarh

Papers
36

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
36papers

Authored papers

36

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

arXiv 2026

2026

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

arXiv 2026

2026

MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization

arXiv 2026

2026

DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers

arXiv 2026

2026

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

arXiv 2025

2025

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

arXiv 2025

2025

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

arXiv 2025

2025

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

arXiv 2025

2025

HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs

arXiv 2025

2025

DarwinLM: Evolutionary Structured Pruning of Large Language Models

arXiv 2025

2025

Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models

arXiv 2025

2025

Efficient Data Selection at Scale via Influence Distillation

arXiv 2025

2025

Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning

arXiv 2024

2025

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models

arXiv 2024

2024

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

arXiv 2024

2024

PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

arXiv 2024

2024

Panza: Design and Analysis of a Fully-Local Personalized Text Writing Assistant

arXiv 2024

2024

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

arXiv 2024

2024

EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search

arXiv 2024

2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

arXiv 2024

2024

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

arXiv 2024

2024

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

arXiv 2023

2023

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

arXiv 2023

2023

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

arXiv 2023

2023

QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models

arXiv 2023

2023

Error Feedback Can Accurately Compress Preconditioners

arXiv 2023

2023

Sparse Fine-tuning for Inference Acceleration of Large Language Models

arXiv 2023

2023

SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks

arXiv 2023

2023

ZipLM: Inference-Aware Structured Pruning of Language Models

ziplm-inference-aware-structured-pruning-of

2023

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

arXiv 2022

2022

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

arXiv 2022

2022

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

arXiv 2022

2022

L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning

arXiv 2022

2022

CrAM: A Compression-Aware Minimizer

arXiv 2022

2022

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

NeurIPS 2021 12

2021

Model compression via distillation and quantization

model-compression-via-distillation-and-1

2018

Affiliations

No known affiliations.

Frequent co-authors

10

from 36 papers