Shoufa Chen
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
arXiv 2026
WavFlow: Audio Generation in Waveform Space
arXiv 2026
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
arXiv 2025
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
arXiv 2025
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
arXiv 2025
PixelFlow: Pixel-Space Generative Models with Flow
arXiv 2025
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
arXiv 2024
ControlAR: Controllable Image Generation with Autoregressive Models
arXiv 2024
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
arXiv 2024
Going Denser with Open-Vocabulary Part Segmentation
ICCV 2023 1
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
arXiv 2023
DiffusionDet: Diffusion Model for Object Detection
ICCV 2023 1
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
arXiv 2022
Affiliations
Frequent co-authors
10from 13 papers