Cite
Notes
Only stored in your browser.
Attribution
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
arXiv 2026
Introspective Diffusion Language Models
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
arXiv 2023
from 3 papers
Shuaiwen Leon Song
Zhongzhu Zhou
Ben Athiwaratkun
Xiaoxia Wu
Chenfeng Xu
Fan Lai
Haojun Xia
James Zou
Jisen Li
Junxiong Wang