Yaohui Wang

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

arXiv 2025

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

arXiv 2025

RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling

arXiv 2025

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

arXiv 2025

DeepSeek-V3 Technical Report

arXiv 2024

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

arXiv 2024

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

arXiv 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

arXiv 2024

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model

arXiv 2024

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

CVPR 2025 1

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

arXiv 2024

Latte: Latent Diffusion Transformer for Video Generation

arXiv 2024

Vlogger: Make Your Dream A Vlog

CVPR 2024 1

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

arXiv 2023

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

arXiv 2023

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

arXiv 2023

Long-Term Rhythmic Video Soundtracker

arXiv 2023

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

CVPR 2024 1