Hanshi Sun
- Papers
- 5
Cite
Notes
Only stored in your browser.
5papers
Authored papers
5R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration
arXiv 2025
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
arXiv 2025
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
arXiv 2024
Fast Best-of-N Decoding via Speculative Rejection
arXiv 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
arXiv 2024
Affiliations
No known affiliations.
Frequent co-authors
10from 5 papers