Yali Wang
- Papers
- 22
Cite
Notes
Only stored in your browser.
Authored papers
22What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
arXiv 2026
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
arXiv 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
arXiv 2025
MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation
arXiv 2025
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
ICCV 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
arXiv 2024
VideoMamba: State Space Model for Efficient Video Understanding
arXiv 2024
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
arXiv 2024
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025 1
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
arXiv 2024
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
arXiv 2024
Vlogger: Make Your Dream A Vlog
CVPR 2024 1
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
arXiv 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
arXiv 2024
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
arXiv 2024
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
CVPR 2023 1
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024 1
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
ICCV 2023 1
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
arXiv 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
arXiv 2022
Self-slimmed Vision Transformer
arXiv 2021
Affiliations
Frequent co-authors
10from 22 papers