0

Wen Wang

Papers
37

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
37papers

Authored papers

37

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

arXiv 2026

2026

Extending Precipitation Nowcasting Horizons via Spectral Fusion of Radar Observations and Foundation Model Priors

arXiv 2026

2026

Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models

arXiv 2026

2026

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

arXiv 2026

2026

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

arXiv 2025

2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation

arXiv 2025

2025

Fun-Audio-Chat Technical Report

arXiv 2025

2025

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

arXiv 2025

2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

arXiv 2025

2025

The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

arXiv 2025

2025

MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues

arXiv 2025

2025

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

arXiv 2025

2025

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

arXiv 2025

2025

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

arXiv 2025

2025

GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

arXiv 2025

2025

OmniAudio: Generating Spatial Audio from 360-Degree Video

arXiv 2025

2025

Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

arXiv 2025

2025

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

arXiv 2024

2024

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

arXiv 2024

2024

MagicQuill: An Intelligent Interactive Image Editing System

CVPR 2025 1

2024

AniDoc: Animation Creation Made Easier

CVPR 2025 1

2024

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

arXiv 2024

2024

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification

arXiv 2024

2024

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

CVPR 2024 1

2024

LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

arXiv 2024

2024

SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning

arXiv 2024

2024

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

arXiv 2024

2024

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

CVPR 2025 1

2024

Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling

arXiv 2023

2023

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

arXiv 2023

2023

SegGPT: Segmenting Everything In Context

arXiv 2023

2023

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

arXiv 2023

2023

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

arXiv 2023

2023

Object-aware Inversion and Reassembly for Image Editing

arXiv 2023

2023

CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

arXiv 2023

2023

CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose

CVPR 2023 1

2022

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences

ponet-pooling-network-for-efficient-token-1

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 37 papers