Yiyuan Zhang
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12Seed1.5-VL Technical Report
arXiv 2025
Native-Resolution Image Synthesis
arXiv 2025
Multimodal Long Video Modeling Based on Temporal Dynamic Context
arXiv 2025
OneThinker: All-in-one Reasoning Model for Image and Video
arXiv 2025
Transition Models: Rethinking the Generative Learning Objective
arXiv 2025
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
arXiv 2024
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
arXiv 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
CVPR 2024 1
Explore the Limits of Omni-modal Pretraining at Scale
arXiv 2024
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
arXiv 2024
Meta-Transformer: A Unified Framework for Multimodal Learning
arXiv 2023
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024 1
Affiliations
Frequent co-authors
10from 12 papers