Jinguo Zhu
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12Attention Residuals
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models
arXiv 2026
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
arXiv 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
arXiv 2025
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
arXiv 2025
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
arXiv 2025
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
arXiv 2024
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
ICCV 2025
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
arXiv 2023
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
CVPR 2023 1
Affiliations
Frequent co-authors
10from 12 papers