0

Yiren Song

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

arXiv 2026

2026

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

arXiv 2026

2026

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

arXiv 2026

2026

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

ICCV 2025

2025

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

arXiv 2025

2025

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

ICCV 2025

2025

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains

arXiv 2025

2025

MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation

arXiv 2025

2025

X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale

arXiv 2025

2025

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

arXiv 2025

2025

OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

arXiv 2025

2025

H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos

arXiv 2025

2025

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

arXiv 2025

2025

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

ICCV 2025

2025

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

arXiv 2025

2025

MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks

arXiv 2025

2025

EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering

arXiv 2025

2025

PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data

arXiv 2025

2025

FocusedAD: Character-centric Movie Audio Description

arXiv 2025

2025

FonTS: Text Rendering with Typography and Style Controls

ICCV 2025

2024

Image Watermarks are Removable Using Controllable Regeneration from Clean Noise

arXiv 2024

2024

Stable-Hair: Real-World Hair Transfer via Diffusion Model

arXiv 2024

2024

Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model

arXiv 2024

2024

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

CVPR 2024 1

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers