Cite
Notes
Only stored in your browser.
Attribution
A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization
arXiv 2025
CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
arXiv 2024
from 2 papers
Junhui He
Qingan Li
Shangyu Wu
Junna Xing
Nan Wang
Peng Zhou
Qiang Liu
Rui Xu
Weidong Wen