Zhuoming Chen
- Papers
- 9
Cite
Notes
Only stored in your browser.
Authored papers
9Kinetics: Rethinking Test-Time Scaling Laws
arXiv 2025
CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models
arXiv 2025
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
arXiv 2024
MagicPIG: LSH Sampling for Efficient LLM Generation
arXiv 2024
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
arXiv 2024
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
arXiv 2024
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
arXiv 2024
Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training
arXiv 2024
Sirius: Contextual Sparsity with Correction for Efficient LLMs
arXiv 2024
Affiliations
Frequent co-authors
10from 9 papers