Haotian Tang
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
arXiv 2025
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
arXiv 2024
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
arXiv 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
arXiv 2024
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
arXiv 2024
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
arXiv 2024
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
arXiv 2023
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
arXiv 2023
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
arXiv 2022
TorchSparse: Efficient Point Cloud Inference Engine
arXiv 2022
End-to-End Entity Detection with Proposer and Regressor
arXiv 2022
Type-supervised sequence labeling based on the heterogeneous star graph for named entity recognition
arXiv 2022
Affiliations
Frequent co-authors
10from 12 papers