Zonghao Guo
- Papers
- 10
Cite
Notes
Only stored in your browser.
Authored papers
10Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
arXiv 2026
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
arXiv 2025
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
arXiv 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
arXiv 2025
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition
arXiv 2025
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
arXiv 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
arXiv 2025
Scaling Efficient Masked Image Modeling on Large Remote Sensing Dataset
ICCV 2025
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
arXiv 2024
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
ICCV 2023 1
Affiliations
Frequent co-authors
10from 10 papers