Hang Xu
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
arXiv 2026
Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
arXiv 2026
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
arXiv 2025
DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning
arXiv 2025
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
arXiv 2025
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
ICCV 2025
DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation
arXiv 2025
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
arXiv 2025
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
arXiv 2025
ACE: Anti-Editing Concept Erasure in Text-to-Image Models
CVPR 2025 1
Semantic-decoupled Spatial Partition Guided Point-supervised Oriented Object Detection
arXiv 2025
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
arXiv 2024
Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
arXiv 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025 1
Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
arXiv 2024
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
arXiv 2024
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping
openlane-v2-a-topology-reasoning-benchmark
Graph-based Topology Reasoning for Driving Scenes
arXiv 2023
A Survey on Video Diffusion Models
arXiv 2023
Baichuan 2: Open Large-scale Language Models
arXiv 2023
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
arXiv 2023
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
arXiv 2023
PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection
ICCV 2023 1
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models
arXiv 2023
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
arXiv 2022
Affiliations
Frequent co-authors
10from 25 papers