Yuanhan Zhang
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
arXiv 2026
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
arXiv 2026
EgoLife: Towards Egocentric Life Assistant
CVPR 2025 1
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
arXiv 2025
LLaVA-OneVision: Easy Visual Task Transfer
arXiv 2024
Long Context Transfer from Language to Vision
arXiv 2024
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
arXiv 2024
OtterHD: A High-Resolution Multi-modality Model
arXiv 2023
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
arXiv 2023
FunQA: Towards Surprising Video Comprehension
arXiv 2023
Learning without Forgetting for Vision-Language Models
arXiv 2023
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy
arXiv 2022
Neural Prompt Search
arXiv 2022
Affiliations
Frequent co-authors
10from 13 papers