Wenbo Hu
- Papers
- 23
Cite
Notes
Only stored in your browser.
Authored papers
23Pixal3D: Pixel-Aligned 3D Generation from Images
arXiv 2026
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
arXiv 2026
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis
arXiv 2026
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
arXiv 2026
MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
arXiv 2026
Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels
arXiv 2026
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
ICCV 2025
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors
ICCV 2025
G^2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
arXiv 2025
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
arXiv 2025
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
arXiv 2025
Interleaving Reasoning for Better Text-to-Image Generation
arXiv 2025
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
ICCV 2025
StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
arXiv 2024
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
arXiv 2024
Matryoshka Query Transformer for Large Vision-Language Models
arXiv 2024
CV-VAE: A Compatible Video VAE for Latent Generative Video Models
arXiv 2024
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
CVPR 2025 1
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
CVPR 2025 1
Verbalized Representation Learning for Interpretable Few-Shot Generalization
ICCV 2025
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
arXiv 2023
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
arXiv 2023
StackVAE-G: An efficient and interpretable model for time series anomaly detection
arXiv 2021
Affiliations
Frequent co-authors
10from 23 papers