Tianle Cai
- Papers
- 14
Cite
Notes
Only stored in your browser.
Authored papers
14In-Place Test-Time Training
arXiv 2026
A Survey on Latent Reasoning
arXiv 2025
Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
arXiv 2025
CommVQ: Commutative Vector Quantization for KV Cache Compression
arXiv 2025
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
arXiv 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
arXiv 2024
SnapKV: LLM Knows What You are Looking for Before Generation
arXiv 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
arXiv 2024
Training-Free Activation Sparsity in Large Language Models
arXiv 2024
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
CVPR 2024 1
BitDelta: Your Fine-Tune May Only Be Worth One Bit
arXiv 2024
Large Language Models as Tool Makers
arXiv 2023
REST: Retrieval-Based Speculative Decoding
arXiv 2023
What Makes Convolutional Models Great on Long Sequence Modeling?
arXiv 2022
Affiliations
Frequent co-authors
10from 14 papers