0

Kai Han

Papers
39

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
39papers

Authored papers

39

Surgical Post-Training: Cutting Errors, Keeping Knowledge

arXiv 2026

2026

An Empirical Study of World Model Quantization

arXiv 2026

2026

ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models

arXiv 2025

2025

ROOT: Robust Orthogonalized Optimizer for Neural Network Training

arXiv 2025

2025

VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

arXiv 2025

2025

v-CLR: View-Consistent Learning for Open-World Instance Segmentation

CVPR 2025 1

2025

Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts

arXiv 2025

2025

Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping

ICCV 2025

2025

EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization

arXiv 2025

2025

Mr. DETR: Instructive Multi-Route Training for Detection Transformers

CVPR 2025 1

2024

SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution

arXiv 2024

2024

Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning

arXiv 2024

2024

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

arXiv 2024

2024

SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning

arXiv 2024

2024

Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games

arXiv 2024

2024

SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation

arXiv 2024

2024

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

arXiv 2024

2024

EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models

arXiv 2024

2024

Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning

arXiv 2024

2024

CusConcept: Customized Visual Concept Decomposition with Diffusion Models

arXiv 2024

2024

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

arXiv 2024

2024

Rethinking Optimization and Architecture for Tiny Language Models

arXiv 2024

2024

Data-efficient Large Vision Models through Sequential Autoregression

arXiv 2024

2024

Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs

arXiv 2024

2024

ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction

arXiv 2024

2024

RegionDrag: Fast Region-Based Image Editing with Diffusion Models

arXiv 2024

2024

PruneVid: Visual Token Pruning for Efficient Video Large Language Models

arXiv 2024

2024

Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

arXiv 2024

2024

Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism

gold-yolo-efficient-object-detector-via

2023

GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks?

arXiv 2023

2023

Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

ICCV 2023 1

2023

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

arXiv 2023

2023

Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery

ICCV 2023 1

2023

ViCo: Plug-and-play Visual Condition for Personalized Text-to-image Generation

arXiv 2023

2023

Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation

ICCV 2023 1

2023

Transformer in Transformer

NeurIPS 2021 12

2021

Augmented Shortcuts for Vision Transformers

NeurIPS 2021 12

2021

Open-Set Recognition: a Good Closed-Set Classifier is All You Need?

open-set-recognition-a-good-closed-set

2021

GhostNet: More Features from Cheap Operations

ghostnet-more-features-from-cheap-operations-1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 39 papers