0

Yu Liu

Papers
46

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
46papers

Authored papers

46

Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation

arXiv 2026

2026

Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

arXiv 2026

2026

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

arXiv 2025

2025

VACE: All-in-One Video Creation and Editing

ICCV 2025

2025

Seed1.5-VL Technical Report

arXiv 2025

2025

Universal Actions for Enhanced Embodied Foundation Models

CVPR 2025 1

2025

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing

arXiv 2025

2025

SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

arXiv 2025

2025

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

arXiv 2025

2025

Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning

arXiv 2025

2025

HunyuanImage 3.0 Technical Report

arXiv 2025

2025

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

arXiv 2025

2025

Flow-Anchored Consistency Models

arXiv 2025

2025

ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting

arXiv 2025

2025

Wan: Open and Advanced Large-Scale Video Generative Models

arXiv 2025

2025

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

arXiv 2025

2025

In-Context LoRA for Diffusion Transformers

arXiv 2024

2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

arXiv 2024

2024

Phased Consistency Models

arXiv 2024

2024

PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements

arXiv 2024

2024

AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data

arXiv 2024

2024

ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

arXiv 2024

2024

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

CVPR 2024 1

2024

Depth Attention for Robust RGB Tracking

arXiv 2024

2024

Enhancing Vision-Language Model with Unmasked Token Alignment

arXiv 2024

2024

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

arXiv 2024

2024

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

arXiv 2024

2024

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

arXiv 2024

2024

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

arXiv 2024

2024

Instruction-Guided Visual Masking

arXiv 2024

2024

IDEA-Bench: How Far are Generative Models from Professional Designing?

CVPR 2025 1

2024

ControlEdit: A MultiModal Local Clothing Image Editing Method

arXiv 2024

2024

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

arXiv 2024

2024

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

CVPR 2024 1

2023

Composer: Creative and Controllable Image Synthesis with Composable Conditions

arXiv 2023

2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

arXiv 2023

2023

Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising

arXiv 2023

2023

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

ICCV 2023 1

2023

3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability

ICCV 2023 1

2023

3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability

ICCV 2023 1

2023

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

ICCV 2023 1

2023

TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification

arXiv 2023

2023

DETRs with Collaborative Hybrid Assignments Training

ICCV 2023 1

2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning

arXiv 2022

2022

Large-batch Optimization for Dense Visual Predictions

arXiv 2022

2022

Self-slimmed Vision Transformer

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 46 papers