0

Hanwang Zhang

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons

arXiv 2026

2026

Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization

ICCV 2025

2025

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

arXiv 2025

2025

WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

arXiv 2025

2025

On Path to Multimodal Generalist: General-Level and General-Bench

arXiv 2025

2025

DEPO: Dual-Efficiency Preference Optimization for LLM Agents

arXiv 2025

2025

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

arXiv 2024

2024

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

arXiv 2024

2024

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

arXiv 2024

2024

Exploring Diffusion Time-steps for Unsupervised Representation Learning

arXiv 2024

2024

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

CVPR 2024 1

2024

Towards Semantic Equivalence of Tokenization in Multimodal LLM

arXiv 2024

2024

ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models

arXiv 2024

2024

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

arXiv 2024

2024

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

arXiv 2024

2024

Fast Diffusion Model

arXiv 2023

2023

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

arXiv 2023

2023

DisCo: Disentangled Control for Realistic Human Dance Generation

CVPR 2024 1

2023

Equivariant Similarity for Vision-Language Foundation Models

ICCV 2023 1

2023

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection

arXiv 2023

2023

Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground

ICCV 2023 1

2022

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

arXiv 2022

2022

Prompt-aligned Gradient for Prompt Tuning

ICCV 2023 1

2022

KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base

ACL 2022 5

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers