Pengfei Wan
- Papers
- 57
Cite
Notes
Only stored in your browser.
Authored papers
57Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos
arXiv 2026
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
arXiv 2026
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
arXiv 2026
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV
arXiv 2026
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
arXiv 2026
A Mechanistic View on Video Generation as World Models: State and Dynamics
arXiv 2026
VINO: A Unified Visual Generator with Interleaved OmniModal Context
arXiv 2026
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
arXiv 2026
Stable Velocity: A Variance Perspective on Flow Matching
arXiv 2026
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
arXiv 2026
Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers
arXiv 2026
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
arXiv 2026
Flow-GRPO: Training Flow Matching Models via Online RL
arXiv 2025
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
ICCV 2025
GameFactory: Creating New Games with Generative Interactive Videos
ICCV 2025
Training-Free Efficient Video Generation via Dynamic Token Carving
arXiv 2025
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification
arXiv 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
arXiv 2025
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
arXiv 2025
Scaling Image and Video Generation via Test-Time Evolutionary Search
arXiv 2025
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
arXiv 2025
Monet: Reasoning in Latent Visual Space Beyond Images and Language
arXiv 2025
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder
arXiv 2025
PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning
arXiv 2025
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation
arXiv 2025
Simulating the Visual World with Artificial Intelligence: A Roadmap
arXiv 2025
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
arXiv 2025
Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention
arXiv 2025
HunyuanImage 3.0 Technical Report
arXiv 2025
Latent Diffusion Model without Variational Autoencoder
arXiv 2025
OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes
arXiv 2025
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
arXiv 2025
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark
arXiv 2025
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
arXiv 2025
GARDO: Reinforcing Diffusion Models without Reward Hacking
arXiv 2025
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
arXiv 2025
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
arXiv 2025
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
arXiv 2025
VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning
arXiv 2025
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing
arXiv 2025
HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment
arXiv 2025
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
arXiv 2025
SketchVideo: Sketch-based Video Generation and Editing
CVPR 2025 1
SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning
arXiv 2025
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
arXiv 2025
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
arXiv 2024
StyleMaster: Stylize Your Video with Artistic Generation and Translation
CVPR 2025 1
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
arXiv 2024
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
arXiv 2024
VideoTetris: Towards Compositional Text-to-Video Generation
arXiv 2024
PlacidDreamer: Advancing Harmony in Text-to-3D Generation
arXiv 2024
Agent Attention: On the Integration of Softmax and Linear Attention
arXiv 2023
I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models
arXiv 2023
DVIS: Decoupled Video Instance Segmentation Framework
ICCV 2023 1
DVIS++: Improved Decoupled Framework for Universal Video Segmentation
arXiv 2023
Augmentation-Aware Self-Supervision for Data-Efficient GAN Training
augmentation-aware-self-supervision-for-data
BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation
NeurIPS 2021 12
Affiliations
Frequent co-authors
10from 57 papers