Cite
Notes
Only stored in your browser.
Attribution
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
arXiv 2025
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
arXiv 2023
from 2 papers
Bo Yuan
Anima Anandkumar
professor
Beidi Chen
Cheng Luo
Hanshi Sun
Jian Ren
Jiawei Zhao
Junjie Hu
Miao Yin
Wen Xiao