Cite
Notes
Only stored in your browser.
Attribution
Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models
arXiv 2025
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
arXiv 2024
Fast Inference of Mixture-of-Experts Language Models with Offloading
arXiv 2023
from 3 papers
Dan Alistarh
Denis Kuznedelev
Vladimir Malinovskii
Alina Shutova
Artyom Eliseev
Ivan Ermakov
Ivan Ilin
Kai Yi
Konstantin Burlachenko
Nikita Surkov