0

Zongxin Yang

Papers
22

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
22papers

Authored papers

22

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

arXiv 2026

2026

Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models

arXiv 2026

2026

MedSAM2: Segment Anything in 3D Medical Images and Videos

arXiv 2025

2025

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

arXiv 2025

2025

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

ICCV 2025

2025

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

CVPR 2025 1

2025

BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment

arXiv 2025

2025

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

arXiv 2025

2025

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

arXiv 2024

2024

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

arXiv 2024

2024

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

arXiv 2024

2024

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation

arXiv 2024

2024

Replication in Visual Diffusion Models: A Survey and Outlook

arXiv 2024

2024

Segment and Track Anything

arXiv 2023

2023

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

ICCV 2023 1

2023

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

CVPR 2024 1

2023

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

arXiv 2023

2023

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

arXiv 2023

2023

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

ICCV 2023 1

2023

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

ICCV 2023 1

2023

Video Object Segmentation in Panoptic Wild Scenes

arXiv 2023

2023

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 22 papers