Guangxuan Xiao
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15StreamingVLM: Real-Time Understanding for Infinite Video Streams
arXiv 2025
XAttention: Block Sparse Attention with Antidiagonal Scoring
arXiv 2025
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
arXiv 2025
Optimizing Mixture of Block Attention
arXiv 2025
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
arXiv 2024
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
arXiv 2024
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
arXiv 2024
Retrieval Head Mechanistically Explains Long-Context Factuality
arXiv 2024
BitDelta: Your Fine-Tune May Only Be Worth One Bit
arXiv 2024
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
arXiv 2024
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
arXiv 2023
Efficient Streaming Language Models with Attention Sinks
arXiv 2023
Offsite-Tuning: Transfer Learning without Full Model
arXiv 2023
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
arXiv 2023
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
arXiv 2022
Affiliations
Frequent co-authors
10from 15 papers