Tong He
- Papers
- 35
Cite
Notes
Only stored in your browser.
Authored papers
35Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video
arXiv 2026
VINO: A Unified Visual Generator with Interleaved OmniModal Context
arXiv 2026
Geo-Align: Video Generation Alignment via Metric Geometry Reward
arXiv 2026
Sekai: A Video Dataset towards World Exploration
arXiv 2025
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
arXiv 2025
Yume-1.5: A Text-Controlled Interactive World Generation Model
arXiv 2025
$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning
arXiv 2025
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
arXiv 2025
Aether: Geometric-Aware Unified World Modeling
ICCV 2025
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
arXiv 2025
BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
arXiv 2025
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving
CVPR 2025 1
Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
arXiv 2024
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
arXiv 2024
PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
arXiv 2024
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
arXiv 2024
Depth Any Video with Scalable Synthetic Data
arXiv 2024
Hallucination of Multimodal Large Language Models: A Survey
arXiv 2024
Unified Lexical Representation for Interpretable Visual-Language Alignment
arXiv 2024
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
arXiv 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
arXiv 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
arXiv 2024
SAM3D: Segment Anything in 3D Scenes
arXiv 2023
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
arXiv 2023
Consistent Video-to-Video Transfer Using Synthetic Dataset
arXiv 2023
Coarse-to-Fine Amodal Segmentation with Shape Prior
ICCV 2023 1
Object-Centric Multiple Object Tracking
ICCV 2023 1
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
CVPR 2024 1
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
CVPR 2024 1
Unsupervised Open-Vocabulary Object Localization in Videos
ICCV 2023 1
Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation
ICCV 2023 1
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
abcnet-real-time-scene-text-spotting-with-1
ResNeSt: Split-Attention Networks
arXiv 2020
Bag of Freebies for Training Object Detection Neural Networks
arXiv 2019
FCOS: Fully Convolutional One-Stage Object Detection
fcos-fully-convolutional-one-stage-object-1
Affiliations
Frequent co-authors
10from 35 papers