Yinan He
- Papers
- 18
Cite
Notes
Only stored in your browser.
Authored papers
18InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
arXiv 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
arXiv 2025
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
arXiv 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
arXiv 2025
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
arXiv 2025
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
VideoMamba: State Space Model for Efficient Video Understanding
arXiv 2024
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
arXiv 2024
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025 1
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
arXiv 2024
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
CVPR 2023 1
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024 1
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
arXiv 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
ICCV 2023 1
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
arXiv 2022
Affiliations
Frequent co-authors
10from 18 papers