Yu Liu
- Papers
- 46
Cite
Notes
Only stored in your browser.
Authored papers
46Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation
arXiv 2026
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
arXiv 2026
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
arXiv 2025
VACE: All-in-One Video Creation and Editing
ICCV 2025
Seed1.5-VL Technical Report
arXiv 2025
Universal Actions for Enhanced Embodied Foundation Models
CVPR 2025 1
GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing
arXiv 2025
SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation
arXiv 2025
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
arXiv 2025
Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning
arXiv 2025
HunyuanImage 3.0 Technical Report
arXiv 2025
Rethinking Video Tokenization: A Conditioned Diffusion-based Approach
arXiv 2025
Flow-Anchored Consistency Models
arXiv 2025
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
arXiv 2025
Wan: Open and Advanced Large-Scale Video Generative Models
arXiv 2025
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
arXiv 2025
In-Context LoRA for Diffusion Transformers
arXiv 2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
arXiv 2024
Phased Consistency Models
arXiv 2024
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
arXiv 2024
AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data
arXiv 2024
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
arXiv 2024
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
CVPR 2024 1
Depth Attention for Robust RGB Tracking
arXiv 2024
Enhancing Vision-Language Model with Unmasked Token Alignment
arXiv 2024
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
arXiv 2024
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
arXiv 2024
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
arXiv 2024
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
arXiv 2024
Instruction-Guided Visual Masking
arXiv 2024
IDEA-Bench: How Far are Generative Models from Professional Designing?
CVPR 2025 1
ControlEdit: A MultiModal Local Clothing Image Editing Method
arXiv 2024
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
arXiv 2024
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
CVPR 2024 1
Composer: Creative and Controllable Image Synthesis with Composable Conditions
arXiv 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
arXiv 2023
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
arXiv 2023
Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
ICCV 2023 1
3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability
ICCV 2023 1
3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability
ICCV 2023 1
GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
ICCV 2023 1
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
arXiv 2023
DETRs with Collaborative Hybrid Assignments Training
ICCV 2023 1
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
arXiv 2022
Large-batch Optimization for Dense Visual Predictions
arXiv 2022
Self-slimmed Vision Transformer
arXiv 2021
Affiliations
Frequent co-authors
10from 46 papers