0

Song Han

Papers
57

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
57papers

Authored papers

57

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

arXiv 2026

2026

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

arXiv 2026

2026

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

arXiv 2026

2026

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

arXiv 2026

2026

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs

arXiv 2026

2026

StreamingVLM: Real-Time Understanding for Infinite Video Streams

arXiv 2025

2026

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

arXiv 2026

2026

Scaling RL to Long Videos

arXiv 2025

2025

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

arXiv 2025

2025

XAttention: Block Sparse Attention with Antidiagonal Scoring

arXiv 2025

2025

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

arXiv 2025

2025

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

arXiv 2025

2025

VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference

arXiv 2025

2025

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

arXiv 2025

2025

ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

arXiv 2025

2025

Optimizing Mixture of Block Attention

arXiv 2025

2025

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos

arXiv 2025

2025

Fast-dLLM v2: Efficient Block-Diffusion LLM

arXiv 2025

2025

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

arXiv 2025

2025

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

arXiv 2025

2025

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

arXiv 2025

2025

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

arXiv 2025

2025

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer

arXiv 2025

2025

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

arXiv 2025

2025

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

arXiv 2025

2025

Scaling Vision Pre-Training to 4K Resolution

CVPR 2025 1

2025

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

arXiv 2024

2024

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

arXiv 2024

2024

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

arXiv 2024

2024

NVILA: Efficient Frontier Visual Language Models

CVPR 2025 1

2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

arXiv 2024

2024

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

arXiv 2024

2024

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

arXiv 2024

2024

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

arXiv 2024

2024

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

CVPR 2024 1

2024

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

arXiv 2024

2024

BitDelta: Your Fine-Tune May Only Be Worth One Bit

arXiv 2024

2024

Wolf: Captioning Everything with a World Summarization Framework

arXiv 2024

2024

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

arXiv 2023

2023

VILA: On Pre-training for Visual Language Models

CVPR 2024 1

2023

Efficient Streaming Language Models with Attention Sinks

arXiv 2023

2023

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

arXiv 2023

2023

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

arXiv 2023

2023

Offsite-Tuning: Transfer Learning without Full Model

arXiv 2023

2023

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

arXiv 2022

2022

TorchSparse: Efficient Point Cloud Inference Engine

arXiv 2022

2022

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

CVPR 2022 1

2022

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

arXiv 2022

2022

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

arXiv 2022

2022

TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

tinytl-reduce-memory-not-parameters-for

2020

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

hat-hardware-aware-transformers-for-efficient-1

2020

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

apq-joint-search-for-network-architecture

2020

Once-for-All: Train One Network and Specialize it for Efficient Deployment

arXiv 2019

2019

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

amc-automl-for-model-compression-and-1

2018

Path-Level Network Transformation for Efficient Architecture Search

path-level-network-transformation-for-1

2018

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

deep-gradient-compression-reducing-the-1

2017

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

arXiv 2016

2016

Affiliations

No known affiliations.

Frequent co-authors

10

from 57 papers