Zhaokai Wang
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
arXiv 2026
GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing
arXiv 2026
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
arXiv 2025
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models
arXiv 2025
GenExam: A Multidisciplinary Text-to-Image Exam
arXiv 2025
Vision-to-Music Generation: A Survey
arXiv 2025
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding
arXiv 2025
ITINERA: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary Planning
arXiv 2024
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
arXiv 2024
Video Background Music Generation: Dataset, Method and Evaluation
ICCV 2023 1
Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
arXiv 2020
Affiliations
Frequent co-authors
10from 13 papers