Hao Zhang
Assistant professor at UC San Diego; co-founder of LMSys and Snowflake-acquired LMNet.ai; key contributor to vLLM, Vicuna, Chatbot Arena, and Alpa.
- Role
- professor
- Currently at
- University of California, San Diego
- twitter.com/haoailab
- GitHub
- github.com/comaniac
- Scholar
- scholar.google.com/citations
- Papers
- 95
Cite
Notes
Only stored in your browser.
Authored papers
95ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
arXiv 2026
SkyReels-V3 Technique Report
arXiv 2026
d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation
arXiv 2026
Learning Versatile Humanoid Manipulation with Touch Dreaming
arXiv 2026
HDINO: A Concise and Efficient Open-Vocabulary Detector
arXiv 2026
LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model
arXiv 2026
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
arXiv 2026
Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
arXiv 2025
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
arXiv 2025
SkyReels-V2: Infinite-length Film Generative Model
arXiv 2025
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
arXiv 2025
Fast Video Generation with Sliding Tile Attention
arXiv 2025
Muon is Scalable for LLM Training
arXiv 2025
lmgame-Bench: How Good are LLMs at Playing Games?
arXiv 2025
Kimi-VL Technical Report
arXiv 2025
Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL
arXiv 2025
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
arXiv 2025
ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Format Restriction, and Column Exploration
arXiv 2025
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
arXiv 2025
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
arXiv 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
arXiv 2025
In-2-4D: Inbetweening from Two Single-View Images to 4D Generation
arXiv 2025
MediAug: Exploring Visual Augmentation in Medical Imaging
arXiv 2025
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations
arXiv 2025
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing
arXiv 2025
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
arXiv 2025
Fast-dLLM v2: Efficient Block-Diffusion LLM
arXiv 2025
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
arXiv 2025
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs
arXiv 2025
Scaling Speculative Decoding with Lookahead Reasoning
arXiv 2025
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
arXiv 2025
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
arXiv 2025
Scaling Language-Centric Omnimodal Representation Learning
arXiv 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
arXiv 2025
Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench
arXiv 2025
PathoHR: Breast Cancer Survival Prediction on High-Resolution Pathological Images
arXiv 2025
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
arXiv 2025
ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning
arXiv 2025
Efficient Long-context Language Model Training by Core Attention Disaggregation
arXiv 2025
MedConv: Convolutions Beat Transformers on Long-Tailed Bone Density Prediction
arXiv 2025
Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation
arXiv 2025
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
ICML
LLaVA-OneVision: Easy Visual Task Transfer
arXiv 2024
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
arXiv 2024
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
arXiv 2024
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
arXiv 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
arXiv 2024
CLLMs: Consistency Large Language Models
arXiv 2024
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
arXiv 2024
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity
high-precision-dichotomous-image-segmentation-1
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
arXiv 2024
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
CVPR 2025 1
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models
arXiv 2024
Efficient LLM Scheduling by Learning to Rank
arXiv 2024
Towards Natural Image Matting in the Wild via Real-Scenario Prior
arXiv 2024
S3O: A Dual-Phase Approach for Reconstructing Dynamic Shape and Skeleton of Articulated Objects from Single Monocular Video
arXiv 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
arXiv 2024
Fast On-device LLM Inference with NPUs
arXiv 2024
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
arXiv 2024
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer
arXiv 2024
Parameter-Efficient Conversational Recommender System as a Language Processing Task
arXiv 2024
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
arXiv 2024
HRVMamba: High-Resolution Visual State Space Model for Dense Prediction
arXiv 2024
Efficiently Serving LLM Reasoning Programs with Certaindex
arXiv 2024
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
NeurIPS
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality
blog
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
arXiv 2023
detrex: Benchmarking Detection Transformers
arXiv 2023
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
arXiv 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
arXiv 2023
A Simple Framework for Open-Vocabulary Segmentation and Detection
ICCV 2023 1
Detection Transformer with Stable Matching
ICCV 2023 1
Interfacing Foundation Models' Embeddings
arXiv 2023
UKP-SQuARE v3: A Platform for Multi-Agent QA Research
arXiv 2023
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
CVPR 2023 1
Online Speculative Decoding
arXiv 2023
MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction
arXiv 2023
Visual In-Context Prompting
CVPR 2024 1
Efficient Memory Management for Large Language Model Serving with PagedAttention
arXiv 2023
Semantic-SAM: Segment and Recognize Anything at Any Granularity
arXiv 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
arXiv 2023
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
arXiv 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
arXiv 2023
DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion
ICCV 2023 1
TLM: Token-Level Masking for Transformers
arXiv 2023
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
dino-detr-with-improved-denoising-anchor
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
mask-dino-towards-a-unified-transformer-based
MPCFormer: fast, performant and private Transformer inference with MPC
arXiv 2022
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
dab-detr-dynamic-anchor-boxes-are-better
Manifoldron: Direct Space Partition via Manifold Discovery
arXiv 2022
Learning Mesh Representations via Binary Space Partitioning Tree Networks
arXiv 2021
Contrastive Attraction and Contrastive Repulsion for Representation Learning
contrastive-attraction-and-contrastive
Semi-supervised URL Segmentation with Recurrent Neural NetworksPre-trained on Knowledge Graph Entities
arXiv 2020
Tool contributions
1Affiliations
Frequent co-authors
10from 95 papers