Xintao Wang
- Papers
- 61
Cite
Notes
Only stored in your browser.
Authored papers
61OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
arXiv 2026
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
arXiv 2026
Flow-GRPO: Training Flow Matching Models via Online RL
arXiv 2025
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
ICCV 2025
GameFactory: Creating New Games with Generative Interactive Videos
ICCV 2025
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation
arXiv 2025
Scaling Image and Video Generation via Test-Time Evolutionary Search
arXiv 2025
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
arXiv 2025
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder
arXiv 2025
GARDO: Reinforcing Diffusion Models without Reward Hacking
arXiv 2025
Simulating the Visual World with Artificial Intelligence: A Roadmap
arXiv 2025
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
arXiv 2025
Latent Diffusion Model without Variational Autoencoder
arXiv 2025
OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes
arXiv 2025
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
arXiv 2025
ARIA: Training Language Agents with Intention-Driven Reward Aggregation
arXiv 2025
VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning
arXiv 2025
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
arXiv 2025
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
arXiv 2025
SketchVideo: Sketch-based Video Generation and Editing
CVPR 2025 1
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
arXiv 2025
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
CVPR 2024 1
StyleMaster: Stylize Your Video with Artistic Generation and Translation
CVPR 2025 1
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
CVPR 2024 1
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
arXiv 2024
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
arXiv 2024
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
arXiv 2024
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
arXiv 2024
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
arXiv 2024
VideoTetris: Towards Compositional Text-to-Video Generation
arXiv 2024
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
arXiv 2024
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
arXiv 2024
InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews
arXiv 2023
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
arXiv 2023
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals
arXiv 2023
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
ICCV 2023 1
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
arXiv 2023
Making LLaMA SEE and Draw with SEED Tokenizer
arXiv 2023
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
arXiv 2023
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
arXiv 2023
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
CVPR 2024 1
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
arXiv 2023
HAT: Hybrid Attention Transformer for Image Restoration
arXiv 2023
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
arXiv 2023
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
arXiv 2023
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
ICCV 2023 1
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
arXiv 2023
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
arXiv 2023
TaleCrafter: Interactive Story Visualization with Multiple Characters
arXiv 2023
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
arXiv 2023
Inserting Anybody in Diffusion Models via Celeb Basis
inserting-anybody-in-diffusion-models-via
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
CVPR 2024 1
Can Large Language Models Understand Real-World Complex Instructions?
arXiv 2023
Activating More Pixels in Image Super-Resolution Transformer
CVPR 2023 1
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
ICCV 2023 1
AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos
arXiv 2022
NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results
arXiv 2022
Towards Real-World Blind Face Restoration with Generative Facial Prior
CVPR 2021 1
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
arXiv 2021
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
arXiv 2018
Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
recovering-realistic-texture-in-image-super-1
Affiliations
Frequent co-authors
10from 61 papers