Zhenheng Yang
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13BitDance: Scaling Autoregressive Generative Models with Binary Tokens
arXiv 2026
Implicit Neural Representation Facilitates Unified Universal Vision Encoding
arXiv 2026
Show-o2: Improved Native Unified Multimodal Models
arXiv 2025
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
ICCV 2025
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
arXiv 2025
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs
arXiv 2025
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
arXiv 2025
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation
arXiv 2025
Parallelized Autoregressive Visual Generation
CVPR 2025 1
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
arXiv 2024
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
CVPR 2025 1
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
arXiv 2024
TALL: Temporal Activity Localization via Language Query
tall-temporal-activity-localization-via-1
Affiliations
Frequent co-authors
10from 13 papers