0

Manyuan Zhang

Papers
23

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
23papers

Authored papers

23

OpenGame: Open Agentic Coding for Games

arXiv 2026

2026

Gen-Searcher: Reinforcing Agentic Search for Image Generation

arXiv 2026

2026

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

arXiv 2026

2026

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

arXiv 2026

2026

Exploring Reasoning Reward Model for Agents

arXiv 2026

2026

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

arXiv 2026

2026

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

arXiv 2026

2026

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

arXiv 2026

2026

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

arXiv 2026

2026

AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing

arXiv 2026

2026

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

arXiv 2026

2026

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

ICCV 2025

2025

OneThinker: All-in-one Reasoning Model for Image and Video

arXiv 2025

2025

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

arXiv 2025

2025

AdaTooler-V: Adaptive Tool-Use for Images and Videos

arXiv 2025

2025

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

arXiv 2025

2025

Architecture Decoupling Is Not All You Need For Unified Multimodal Model

arXiv 2025

2025

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

arXiv 2025

2025

IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction

arXiv 2025

2025

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

arXiv 2025

2025

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

arXiv 2025

2025

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

arXiv 2025

2025

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation

ICCV 2023 1

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 23 papers