Hao Tang
- Papers
- 59
Cite
Notes
Only stored in your browser.
Authored papers
59SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models
arXiv 2026
MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing
arXiv 2026
Code2Worlds: Empowering Coding LLMs for 4D World Generation
arXiv 2026
Anisotropic Modality Align
arXiv 2026
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
arXiv 2026
GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning
arXiv 2026
AnyDepth: Depth Estimation Made Easy
arXiv 2026
UniMesh: Unifying 3D Mesh Understanding and Generation
arXiv 2026
PresentAgent-2: Towards Generalist Multimodal Presentation Agents
arXiv 2026
MMA: Multimodal Memory Agent
arXiv 2026
MWM: Mobile World Models for Action-Conditioned Consistent Prediction
arXiv 2026
Light4D: Training-Free Extreme Viewpoint 4D Video Relighting
arXiv 2026
MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation
arXiv 2026
WebCryptoAgent: Agentic Crypto Trading with Web Informatics
arXiv 2026
StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation
arXiv 2026
3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence
arXiv 2026
HSG: Hyperbolic Scene Graph
arXiv 2026
SAM 3D: 3Dfy Anything in Images
arXiv 2025
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality
arXiv 2025
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
CVPR 2025 1
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
arXiv 2025
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
arXiv 2025
PoE-World: Compositional World Modeling with Products of Programmatic Experts
arXiv 2025
Learning Compact Vision Tokens for Efficient Large Multimodal Models
arXiv 2025
3D CoCa: Contrastive Learners are 3D Captioners
arXiv 2025
ReMoMask: Retrieval-Augmented Masked Motion Generation
arXiv 2025
MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots
arXiv 2025
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
arXiv 2025
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
arXiv 2025
TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation
arXiv 2025
EvoVLA: Self-Evolving Vision-Language-Action Model
arXiv 2025
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
arXiv 2025
DragMesh: Interactive 3D Generation Made Easy
arXiv 2025
Nav-R1: Reasoning and Navigation in Embodied Scenes
arXiv 2025
StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes
arXiv 2025
Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis
arXiv 2025
EgoLCD: Egocentric Video Generation with Long Context Diffusion
arXiv 2025
Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation
arXiv 2025
GiT: Towards Generalist Vision Transformer through Universal Language Interface
arXiv 2024
Combining Induction and Transduction for Abstract Reasoning
arXiv 2024
Barbie: Text to Barbie-Style 3D Avatars
arXiv 2024
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation
arXiv 2024
Stable-Hair: Real-World Hair Transfer via Diffusion Model
arXiv 2024
KMM: Key Frame Mask Mamba for Extended Motion Generation
arXiv 2024
3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
arXiv 2024
InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation
arXiv 2024
Hierarchical Indexing for Retrieval-Augmented Opinion Summarization
arXiv 2024
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
CVPR 2023 1
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation
ICCV 2023 1
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
CVPR 2023 1
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
CVPR 2024 1
Attributable and Scalable Opinion Summarization
arXiv 2023
Hierarchical Sketch Induction for Paraphrase Generation
ACL 2022 5
Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization
arXiv 2021
Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation
arXiv 2021
DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
CVPR 2022 1
Vector-Quantized Autoregressive Predictive Coding
arXiv 2020
Unified Generative Adversarial Networks for Controllable Image-to-Image Translation
arXiv 2019
Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion
arXiv 2019
Affiliations
Frequent co-authors
10from 59 papers