Cite
Notes
Only stored in your browser.
Attribution
Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
arXiv 2024
ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals
from 2 papers
Kaushik Roy
Gobinda Saha
Sakshi Choudhary
Sayeh Sharify
Xin Wang