Shuai Yang
- Papers
- 42
Cite
Notes
Only stored in your browser.
Authored papers
42A Pragmatic VLA Foundation Model
arXiv 2026
Causal World Modeling for Robot Control
arXiv 2026
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
arXiv 2026
DVD: Deterministic Video Depth Estimation with Generative Priors
arXiv 2026
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
arXiv 2026
WORLDMEM: Long-term Consistent World Simulation with Memory
arXiv 2025
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support
arXiv 2025
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
arXiv 2025
Training-Free Watermarking for Autoregressive Image Generation
arXiv 2025
Balanced Image Stylization with Style Matching Score
ICCV 2025
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation
arXiv 2025
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers
arXiv 2025
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
arXiv 2025
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
arXiv 2025
TokensGen: Harnessing Condensed Tokens for Long Video Generation
ICCV 2025
MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
CVPR 2025 1
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
arXiv 2025
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
arXiv 2025
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
arXiv 2025
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
arXiv 2025
RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling
arXiv 2025
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space
arXiv 2025
Imagine360: Immersive 360 Video Generation from Perspective Anchor
arXiv 2024
GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation
arXiv 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
arXiv 2024
3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors
arXiv 2024
SEED-Story: Multimodal Long Story Generation with Large Language Model
arXiv 2024
Grounded 3D-LLM with Referent Tokens
arXiv 2024
Forward Learning of Graph Neural Networks
arXiv 2024
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
CVPR 2024 1
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
arXiv 2024
StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces
ICCV 2023 1
Text2Performer: Text-Driven Human Video Generation
ICCV 2023 1
Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation
ICCV 2023 1
DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields
arXiv 2023
Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation
ICCV 2023 1
Denoising Diffusion Step-aware Models
arXiv 2023
Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics
arXiv 2023
VToonify: Controllable High-Resolution Portrait Video Style Transfer
arXiv 2022
Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
CVPR 2022 1
Text2Human: Text-Driven Controllable Human Image Generation
arXiv 2022
BARS-CTR: Open Benchmarking for Click-Through Rate Prediction
arXiv 2020
Affiliations
Frequent co-authors
10from 42 papers