Ziyu Guo
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20GENIUS: Generative Fluid Intelligence Evaluation Suite
arXiv 2026
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
arXiv 2025
EditThinker: Unlocking Iterative Reasoning for Any Image Editor
arXiv 2025
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
arXiv 2025
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
arXiv 2025
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities
arXiv 2025
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
arXiv 2025
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
arXiv 2025
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation
arXiv 2025
DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation
arXiv 2025
Architecture Decoupling Is Not All You Need For Unified Multimodal Model
arXiv 2025
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
arXiv 2025
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
arXiv 2024
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
arXiv 2024
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
arXiv 2024
ImageBind-LLM: Multi-modality Instruction Tuning
arXiv 2023
Personalize Segment Anything Model with One Shot
arXiv 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
arXiv 2023
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning
ICCV 2023 1
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
ICCV 2023 1
Affiliations
Frequent co-authors
10from 20 papers