Jiaming Tang
- Papers
- 5
Cite
Notes
Only stored in your browser.
5papers
Authored papers
5LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
arXiv 2025
VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
arXiv 2025
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
arXiv 2024
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
arXiv 2024
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
arXiv 2023
Affiliations
No known affiliations.
Frequent co-authors
10from 5 papers