Zhenyu Yang
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning
arXiv 2026
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments
arXiv 2025
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
arXiv 2025
H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding
arXiv 2025
Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
arXiv 2025
LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
arXiv 2025
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
ICCV 2025
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
arXiv 2025
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
arXiv 2025
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
arXiv 2024
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
arXiv 2024
LaMemo: Language Modeling with Look-Ahead Memory
NAACL 2022 7
Affiliations
Frequent co-authors
10from 12 papers