Wen Wang
- Papers
- 37
Cite
Notes
Only stored in your browser.
Authored papers
37Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
arXiv 2026
Extending Precipitation Nowcasting Horizons via Spectral Fusion of Radar Observations and Foundation Model Priors
arXiv 2026
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models
arXiv 2026
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
arXiv 2026
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
arXiv 2025
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation
arXiv 2025
Fun-Audio-Chat Technical Report
arXiv 2025
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
arXiv 2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
arXiv 2025
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
arXiv 2025
MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
arXiv 2025
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
arXiv 2025
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
arXiv 2025
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
arXiv 2025
GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding
arXiv 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
arXiv 2025
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
arXiv 2025
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
arXiv 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
arXiv 2024
MagicQuill: An Intelligent Interactive Image Editing System
CVPR 2025 1
AniDoc: Animation Creation Made Easier
CVPR 2025 1
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
arXiv 2024
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
arXiv 2024
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
CVPR 2024 1
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models
arXiv 2024
SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning
arXiv 2024
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
arXiv 2024
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
CVPR 2025 1
Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling
arXiv 2023
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
arXiv 2023
SegGPT: Segmenting Everything In Context
arXiv 2023
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
arXiv 2023
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
arXiv 2023
Object-aware Inversion and Reassembly for Image Editing
arXiv 2023
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
arXiv 2023
CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose
CVPR 2023 1
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
ponet-pooling-network-for-efficient-token-1
Affiliations
Frequent co-authors
10from 37 papers