Shitian Zhao
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12PyVision-RL: Forging Open Agentic Vision Models via RL
arXiv 2026
Sekai: A Video Dataset towards World Exploration
arXiv 2025
Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
arXiv 2025
OmniCaptioner: One Captioner to Rule Them All
arXiv 2025
PyVision: Agentic Vision with Dynamic Tooling
arXiv 2025
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
arXiv 2025
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
arXiv 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
arXiv 2025
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
arXiv 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
arXiv 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
arXiv 2024
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
arXiv 2024
Affiliations
Frequent co-authors
10from 12 papers