0

Hang Xu

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

arXiv 2026

2026

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

arXiv 2026

2026

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

arXiv 2025

2025

DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning

arXiv 2025

2025

SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

arXiv 2025

2025

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

ICCV 2025

2025

DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation

arXiv 2025

2025

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

arXiv 2025

2025

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

arXiv 2025

2025

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

CVPR 2025 1

2025

Semantic-decoupled Spatial Partition Guided Point-supervised Oriented Object Detection

arXiv 2025

2025

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning

arXiv 2024

2024

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

arXiv 2024

2024

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

CVPR 2025 1

2024

Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion

arXiv 2024

2024

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

arXiv 2024

2024

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

openlane-v2-a-topology-reasoning-benchmark

2023

Graph-based Topology Reasoning for Driving Scenes

arXiv 2023

2023

A Survey on Video Diffusion Models

arXiv 2023

2023

Baichuan 2: Open Large-scale Language Models

arXiv 2023

2023

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

arXiv 2023

2023

MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing

arXiv 2023

2023

PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection

ICCV 2023 1

2023

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models

arXiv 2023

2023

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers