0

Hao Tang

Papers
59

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
59papers

Authored papers

59

SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

arXiv 2026

2026

MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing

arXiv 2026

2026

Code2Worlds: Empowering Coding LLMs for 4D World Generation

arXiv 2026

2026

Anisotropic Modality Align

arXiv 2026

2026

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

arXiv 2026

2026

GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning

arXiv 2026

2026

AnyDepth: Depth Estimation Made Easy

arXiv 2026

2026

UniMesh: Unifying 3D Mesh Understanding and Generation

arXiv 2026

2026

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

arXiv 2026

2026

MMA: Multimodal Memory Agent

arXiv 2026

2026

MWM: Mobile World Models for Action-Conditioned Consistent Prediction

arXiv 2026

2026

Light4D: Training-Free Extreme Viewpoint 4D Video Relighting

arXiv 2026

2026

MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation

arXiv 2026

2026

WebCryptoAgent: Agentic Crypto Trading with Web Informatics

arXiv 2026

2026

StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

arXiv 2026

2026

3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence

arXiv 2026

2026

HSG: Hyperbolic Scene Graph

arXiv 2026

2026

SAM 3D: 3Dfy Anything in Images

arXiv 2025

2025

Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality

arXiv 2025

2025

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

CVPR 2025 1

2025

Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation

arXiv 2025

2025

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

arXiv 2025

2025

PoE-World: Compositional World Modeling with Products of Programmatic Experts

arXiv 2025

2025

Learning Compact Vision Tokens for Efficient Large Multimodal Models

arXiv 2025

2025

3D CoCa: Contrastive Learners are 3D Captioners

arXiv 2025

2025

ReMoMask: Retrieval-Augmented Masked Motion Generation

arXiv 2025

2025

MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

arXiv 2025

2025

3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding

arXiv 2025

2025

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface

arXiv 2025

2025

TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation

arXiv 2025

2025

EvoVLA: Self-Evolving Vision-Language-Action Model

arXiv 2025

2025

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

arXiv 2025

2025

DragMesh: Interactive 3D Generation Made Easy

arXiv 2025

2025

Nav-R1: Reasoning and Navigation in Embodied Scenes

arXiv 2025

2025

StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes

arXiv 2025

2025

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis

arXiv 2025

2025

EgoLCD: Egocentric Video Generation with Long Context Diffusion

arXiv 2025

2025

Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation

arXiv 2025

2025

GiT: Towards Generalist Vision Transformer through Universal Language Interface

arXiv 2024

2024

Combining Induction and Transduction for Abstract Reasoning

arXiv 2024

2024

Barbie: Text to Barbie-Style 3D Avatars

arXiv 2024

2024

MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation

arXiv 2024

2024

Stable-Hair: Real-World Hair Transfer via Diffusion Model

arXiv 2024

2024

KMM: Key Frame Mask Mamba for Extended Motion Generation

arXiv 2024

2024

3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

arXiv 2024

2024

InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation

arXiv 2024

2024

Hierarchical Indexing for Retrieval-Augmented Opinion Summarization

arXiv 2024

2024

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

CVPR 2023 1

2023

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

ICCV 2023 1

2023

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

CVPR 2023 1

2023

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

CVPR 2024 1

2023

Attributable and Scalable Opinion Summarization

arXiv 2023

2023

Hierarchical Sketch Induction for Paraphrase Generation

ACL 2022 5

2022

Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization

arXiv 2021

2021

Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

arXiv 2021

2021

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

CVPR 2022 1

2020

Vector-Quantized Autoregressive Predictive Coding

arXiv 2020

2020

Unified Generative Adversarial Networks for Controllable Image-to-Image Translation

arXiv 2019

2019

Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 59 papers