Tong Wu
- Papers
- 29
Cite
Notes
Only stored in your browser.
Authored papers
29The AI Hippocampus: How Far are We From Human Memory?
arXiv 2026
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
arXiv 2025
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
arXiv 2025
RelightVid: Temporal-Consistent Diffusion Model for Video Relighting
arXiv 2025
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
arXiv 2025
V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties
arXiv 2025
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
arXiv 2025
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
arXiv 2025
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
ICCV 2025
SS4D: Native 4D Generative Model via Structured Spacetime Latents
arXiv 2025
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens
arXiv 2025
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
arXiv 2024
Imagine360: Immersive 360 Video Generation from Perspective Anchor
arXiv 2024
An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding
arXiv 2024
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
arXiv 2024
3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors
arXiv 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
CVPR 2024 1
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
arXiv 2024
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
CVPR 2025 1
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models
arXiv 2024
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
arXiv 2024
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
arXiv 2024
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
arXiv 2024
Sinkhorn Distance Minimization for Knowledge Distillation
arXiv 2024
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
CVPR 2025 1
Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
arXiv 2024
CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation
arXiv 2024
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases
arXiv 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
CVPR 2024 1
Affiliations
Frequent co-authors
10from 29 papers