Kai Han
- Papers
- 39
Cite
Notes
Only stored in your browser.
Authored papers
39An Empirical Study of World Model Quantization
arXiv 2026
Surgical Post-Training: Cutting Errors, Keeping Knowledge
arXiv 2026
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
arXiv 2025
ROOT: Robust Orthogonalized Optimizer for Neural Network Training
arXiv 2025
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
arXiv 2025
Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping
ICCV 2025
v-CLR: View-Consistent Learning for Open-World Instance Segmentation
CVPR 2025 1
Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts
arXiv 2025
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization
arXiv 2025
Mr. DETR: Instructive Multi-Route Training for Detection Transformers
CVPR 2025 1
SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution
arXiv 2024
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
arXiv 2024
Rethinking Optimization and Architecture for Tiny Language Models
arXiv 2024
Data-efficient Large Vision Models through Sequential Autoregression
arXiv 2024
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs
arXiv 2024
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
arXiv 2024
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning
arXiv 2024
CusConcept: Customized Visual Concept Decomposition with Diffusion Models
arXiv 2024
RegionDrag: Fast Region-Based Image Editing with Diffusion Models
arXiv 2024
PruneVid: Visual Token Pruning for Efficient Video Large Language Models
arXiv 2024
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
arXiv 2024
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
arXiv 2024
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
arXiv 2024
SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning
arXiv 2024
Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games
arXiv 2024
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
arXiv 2024
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
arXiv 2024
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
arXiv 2024
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism
gold-yolo-efficient-object-detector-via
GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks?
arXiv 2023
ViCo: Plug-and-play Visual Condition for Personalized Text-to-image Generation
arXiv 2023
Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation
ICCV 2023 1
Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
ICCV 2023 1
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
arXiv 2023
Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery
ICCV 2023 1
Transformer in Transformer
NeurIPS 2021 12
Augmented Shortcuts for Vision Transformers
NeurIPS 2021 12
Open-Set Recognition: a Good Closed-Set Classifier is All You Need?
open-set-recognition-a-good-closed-set
GhostNet: More Features from Cheap Operations
ghostnet-more-features-from-cheap-operations-1
Affiliations
Frequent co-authors
10from 39 papers