0

Tong He

Papers
35

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
35papers

Authored papers

35

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

arXiv 2026

2026

VINO: A Unified Visual Generator with Interleaved OmniModal Context

arXiv 2026

2026

Geo-Align: Video Generation Alignment via Metric Geometry Reward

arXiv 2026

2026

Sekai: A Video Dataset towards World Exploration

arXiv 2025

2025

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

arXiv 2025

2025

Yume-1.5: A Text-Controlled Interactive World Generation Model

arXiv 2025

2025

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning

arXiv 2025

2025

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

arXiv 2025

2025

Aether: Geometric-Aware Unified World Modeling

ICCV 2025

2025

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

arXiv 2025

2025

BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation

arXiv 2025

2025

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

CVPR 2025 1

2025

Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

arXiv 2024

2024

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

arXiv 2024

2024

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

arXiv 2024

2024

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

arXiv 2024

2024

Depth Any Video with Scalable Synthetic Data

arXiv 2024

2024

Hallucination of Multimodal Large Language Models: A Survey

arXiv 2024

2024

Unified Lexical Representation for Interpretable Visual-Language Alignment

arXiv 2024

2024

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

arXiv 2024

2024

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

arXiv 2024

2024

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation

arXiv 2024

2024

SAM3D: Segment Anything in 3D Scenes

arXiv 2023

2023

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

arXiv 2023

2023

Consistent Video-to-Video Transfer Using Synthetic Dataset

arXiv 2023

2023

Coarse-to-Fine Amodal Segmentation with Shape Prior

ICCV 2023 1

2023

Object-Centric Multiple Object Tracking

ICCV 2023 1

2023

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

CVPR 2024 1

2023

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

CVPR 2024 1

2023

Unsupervised Open-Vocabulary Object Localization in Videos

ICCV 2023 1

2023

Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

ICCV 2023 1

2023

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

abcnet-real-time-scene-text-spotting-with-1

2020

ResNeSt: Split-Attention Networks

arXiv 2020

2020

Bag of Freebies for Training Object Detection Neural Networks

arXiv 2019

2019

FCOS: Fully Convolutional One-Stage Object Detection

fcos-fully-convolutional-one-stage-object-1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 35 papers