0

Hao Tan

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation

arXiv 2026

2026

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

arXiv 2026

2026

Gaussian Mixture Flow Matching Models

arXiv 2025

2025

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

arXiv 2025

2025

E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training

arXiv 2025

2025

HunyuanVideo 1.5 Technical Report

arXiv 2025

2025

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

arXiv 2025

2025

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

arXiv 2025

2025

Rethinking Training Dynamics in Scale-wise Autoregressive Generation

arXiv 2025

2025

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

arXiv 2024

2024

Turbo3D: Ultra-fast Text-to-3D Generation

CVPR 2025 1

2024

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

arXiv 2024

2024

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

arXiv 2024

2024

Progressive Autoregressive Video Diffusion Models

arXiv 2024

2024

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

arXiv 2024

2024

HunyuanVideo: A Systematic Framework For Large Video Generative Models

arXiv 2024

2024

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

arXiv 2024

2024

Learning Navigational Visual Representations with Semantic Map Supervision

ICCV 2023 1

2023

DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

arXiv 2023

2023

Scaling Data Generation in Vision-and-Language Navigation

ICCV 2023 1

2023

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

NeurIPS 2021 12

2021

How Much Can CLIP Benefit Vision-and-Language Tasks?

arXiv 2021

2021

Unifying Vision-and-Language Tasks via Text Generation

arXiv 2021

2021

Expressing Visual Relationships via Language

expressing-visual-relationships-via-language-1

2019

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

lxmert-learning-cross-modality-encoder-1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers