0

Wenbo Hu

Papers
23

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
23papers

Authored papers

23

Pixal3D: Pixel-Aligned 3D Generation from Images

arXiv 2026

2026

VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

arXiv 2026

2026

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

arXiv 2026

2026

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

arXiv 2026

2026

MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

arXiv 2026

2026

Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels

arXiv 2026

2026

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

ICCV 2025

2025

NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors

ICCV 2025

2025

G^2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning

arXiv 2025

2025

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

arXiv 2025

2025

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

arXiv 2025

2025

Interleaving Reasoning for Better Text-to-Image Generation

arXiv 2025

2025

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

ICCV 2025

2025

StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos

arXiv 2024

2024

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

arXiv 2024

2024

Matryoshka Query Transformer for Large Vision-Language Models

arXiv 2024

2024

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

arXiv 2024

2024

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images

CVPR 2025 1

2024

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

CVPR 2025 1

2024

Verbalized Representation Learning for Interpretable Few-Shot Generalization

ICCV 2025

2024

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

arXiv 2023

2023

DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation

arXiv 2023

2023

StackVAE-G: An efficient and interpretable model for time series anomaly detection

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 23 papers