Yong Zhang
- Papers
- 61
Cite
Notes
Only stored in your browser.
Authored papers
61Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
arXiv 2026
Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory
arXiv 2026
CPPO: Contrastive Perception for Vision Language Policy Optimization
arXiv 2026
WildActor: Unconstrained Identity-Preserving Video Generation
arXiv 2026
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
arXiv 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
arXiv 2025
From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model
arXiv 2025
Active Intelligence in Video Avatars via Closed-loop World Modeling
arXiv 2025
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
arXiv 2025
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
arXiv 2025
Mobius: Text to Seamless Looping Video Generation via Latent Shift
arXiv 2025
Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective
arXiv 2025
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
CVPR 2025 1
StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
arXiv 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
CVPR 2024 1
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation
arXiv 2024
Evaluating LLM Reasoning in the Operations Research Domain with ORQA
arXiv 2024
CV-VAE: A Compatible Video VAE for Latent Generative Video Models
arXiv 2024
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
CVPR 2025 1
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
arXiv 2024
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
arXiv 2024
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
arXiv 2024
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
arXiv 2024
Efficiently Serving Large Multimodal Models Using EPD Disaggregation
arXiv 2024
LaWa: Using Latent Space for In-Generation Image Watermarking
arXiv 2024
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
arXiv 2024
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
CVPR 2025 1
Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
arXiv 2024
Towards Secure and Usable 3D Assets: A Novel Framework for Automatic Visible Watermarking
arXiv 2024
CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training
arXiv 2024
GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation
arXiv 2024
Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers
arXiv 2024
AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking
arXiv 2024
T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations
arXiv 2023
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
arXiv 2023
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
arXiv 2023
NL4Opt Competition: Formulating Optimization Problems Based on Their Natural Language Descriptions
arXiv 2023
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
ICCV 2023 1
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
arXiv 2023
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
arXiv 2023
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
arXiv 2023
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing
CVPR 2023 1
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
arXiv 2023
ReliableSwap: Boosting General Face Swapping Via Reliable Supervision
arXiv 2023
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
CVPR 2024 1
Domain Generalization via Rationale Invariance
ICCV 2023 1
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
ICCV 2023 1
DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection
deepfakebench-a-comprehensive-benchmark-of
TaleCrafter: Interactive Story Visualization with Multiple Characters
arXiv 2023
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
arXiv 2023
Inserting Anybody in Diffusion Models via Celeb Basis
inserting-anybody-in-diffusion-models-via
Improved Test-Time Adaptation for Domain Generalization
CVPR 2023 1
ETran: Energy-Based Transferability Estimation
ICCV 2023 1
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages
arXiv 2023
Latent Video Diffusion Models for High-Fidelity Long Video Generation
arXiv 2022
E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models
e-lang-energy-based-joint-inferencing-of
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
CVPR 2023 1
Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection
CVPR 2022 1
SimROD: A Simple Adaptation Method for Robust Object Detection
ICCV 2021 10
Generating Self-Contained and Summary-Centric Question Answer Pairs via Differentiable Reward Imitation Learning
EMNLP 2021 11
EBJR: Energy-Based Joint Reasoning for Adaptive Inference
arXiv 2021
Affiliations
Frequent co-authors
10from 61 papers