0

Yong Zhang

Papers
61

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
61papers

Authored papers

61

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry

arXiv 2026

2026

Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory

arXiv 2026

2026

CPPO: Contrastive Perception for Vision Language Policy Optimization

arXiv 2026

2026

WildActor: Unconstrained Identity-Preserving Video Generation

arXiv 2026

2026

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

arXiv 2025

2025

Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models

arXiv 2025

2025

From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model

arXiv 2025

2025

Active Intelligence in Video Avatars via Closed-loop World Modeling

arXiv 2025

2025

A Survey of Data Agents: Emerging Paradigm or Overstated Hype?

arXiv 2025

2025

Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes

arXiv 2025

2025

Mobius: Text to Seamless Looping Video Generation via Latent Shift

arXiv 2025

2025

Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective

arXiv 2025

2025

CASP: Compression of Large Multimodal Models Based on Attention Sparsity

CVPR 2025 1

2025

StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos

arXiv 2024

2024

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

CVPR 2024 1

2024

AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation

arXiv 2024

2024

Evaluating LLM Reasoning in the Operations Research Domain with ORQA

arXiv 2024

2024

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

arXiv 2024

2024

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

CVPR 2025 1

2024

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

arXiv 2024

2024

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

arXiv 2024

2024

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

arXiv 2024

2024

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

arXiv 2024

2024

Efficiently Serving Large Multimodal Models Using EPD Disaggregation

arXiv 2024

2024

LaWa: Using Latent Space for In-Generation Image Watermarking

arXiv 2024

2024

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

arXiv 2024

2024

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

CVPR 2025 1

2024

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

arXiv 2024

2024

Towards Secure and Usable 3D Assets: A Novel Framework for Automatic Visible Watermarking

arXiv 2024

2024

CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training

arXiv 2024

2024

GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation

arXiv 2024

2024

Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers

arXiv 2024

2024

AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking

arXiv 2024

2024

T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations

arXiv 2023

2023

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

arXiv 2023

2023

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

arXiv 2023

2023

NL4Opt Competition: Formulating Optimization Problems Based on Their Natural Language Descriptions

arXiv 2023

2023

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

ICCV 2023 1

2023

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

arXiv 2023

2023

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

arXiv 2023

2023

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

arXiv 2023

2023

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

CVPR 2023 1

2023

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

arXiv 2023

2023

ReliableSwap: Boosting General Face Swapping Via Reliable Supervision

arXiv 2023

2023

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

CVPR 2024 1

2023

Domain Generalization via Rationale Invariance

ICCV 2023 1

2023

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

ICCV 2023 1

2023

DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection

deepfakebench-a-comprehensive-benchmark-of

2023

TaleCrafter: Interactive Story Visualization with Multiple Characters

arXiv 2023

2023

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

arXiv 2023

2023

Inserting Anybody in Diffusion Models via Celeb Basis

inserting-anybody-in-diffusion-models-via

2023

Improved Test-Time Adaptation for Domain Generalization

CVPR 2023 1

2023

ETran: Energy-Based Transferability Estimation

ICCV 2023 1

2023

ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages

arXiv 2023

2023

Latent Video Diffusion Models for High-Fidelity Long Video Generation

arXiv 2022

2022

E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models

e-lang-energy-based-joint-inferencing-of

2022

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

CVPR 2023 1

2022

Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection

CVPR 2022 1

2022

SimROD: A Simple Adaptation Method for Robust Object Detection

ICCV 2021 10

2021

Generating Self-Contained and Summary-Centric Question Answer Pairs via Differentiable Reward Imitation Learning

EMNLP 2021 11

2021

EBJR: Energy-Based Joint Reasoning for Adaptive Inference

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 61 papers