Yunhai Tong

Papers: 25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

25papers

Authored papers

Towards Customized Multimodal Role-Play

arXiv 2026

2026

VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification

arXiv 2026

2026

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

arXiv 2025

2025

MMaDA: Multimodal Large Diffusion Language Models

arXiv 2025

2025

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

arXiv 2025

2025

CyberV: Cybernetics for Test-time Scaling in Video Understanding

arXiv 2025

2025

RecTok: Reconstruction Distillation along Rectified Flow

arXiv 2025

2025

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

arXiv 2025

2025

VMoBA: Mixture-of-Block Attention for Video Diffusion Models

arXiv 2025

2025

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

arXiv 2025

2025

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

arXiv 2025

2025

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer

ICCV 2025

2025

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

ICCV 2025

2025

Training-free Diffusion Acceleration with Bottleneck Sampling

arXiv 2025

2025

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

arXiv 2025

2025

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

arXiv 2025

2025

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

CVPR 2025 1

2024

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

arXiv 2024

2024

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything

arXiv 2024

2024

RelationBooth: Towards Relation-Aware Customized Object Generation

arXiv 2024

2024

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

arXiv 2024

2024

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

ICCV 2023 1

2023

Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition

arXiv 2022

2022

TS2Vec: Towards Universal Representation of Time Series

arXiv 2021

2021

Customizing Graph Neural Networks using Path Reweighting

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

from 25 papers

Xiangtai Li

Jianzong Wu

Lu Qi

Ye Tian

Qingyu Shi

Shilin Xu

Tao Zhang

Yujing Wang

Haochen Wang

Jiahao Meng