Yang Zhou
- Papers
- 40
Cite
Notes
Only stored in your browser.
Authored papers
40WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation
arXiv 2026
Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
arXiv 2026
RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation
arXiv 2026
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
arXiv 2026
DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning
arXiv 2026
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
arXiv 2026
EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing
arXiv 2026
Generative AI for Autonomous Driving: Frontiers and Opportunities
arXiv 2025
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
CVPR 2025 1
Kinetics: Rethinking Test-Time Scaling Laws
arXiv 2025
SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data
arXiv 2025
LangCoop: Collaborative Driving with Language
arXiv 2025
$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning
arXiv 2025
Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection
arXiv 2025
M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark
arXiv 2025
Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe
arXiv 2025
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
arXiv 2025
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
arXiv 2025
Aether: Geometric-Aware Unified World Modeling
ICCV 2025
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
arXiv 2025
VeriGUI: Verifiable Long-Chain GUI Dataset
arXiv 2025
LLM Inference Unveiled: Survey and Roofline Model Insights
arXiv 2024
MagicPIG: LSH Sampling for Efficient LLM Generation
arXiv 2024
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
arXiv 2024
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving
arXiv 2024
Progressive Autoregressive Video Diffusion Models
arXiv 2024
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
arXiv 2024
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving
arXiv 2024
BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays
arXiv 2024
Region Attention Transformer for Medical Image Restoration
arXiv 2024
Sirius: Contextual Sparsity with Correction for Efficient LLMs
arXiv 2024
UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling
arXiv 2024
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
arXiv 2024
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization
arXiv 2024
ContactGen: Generative Contact Modeling for Grasp Generation
contactgen-generative-contact-modeling-for
Learning Navigational Visual Representations with Semantic Map Supervision
ICCV 2023 1
DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis
arXiv 2023
Modular Degradation Simulation and Restoration for Under-Display Camera
arXiv 2022
A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges
arXiv 2022
Rethinking Performance Gains in Image Dehazing Networks
arXiv 2022
Affiliations
Frequent co-authors
10from 40 papers