Yunhai Tong
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25Towards Customized Multimodal Role-Play
arXiv 2026
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
arXiv 2025
VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification
arXiv 2026
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
arXiv 2025
MMaDA: Multimodal Large Diffusion Language Models
arXiv 2025
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
arXiv 2025
CyberV: Cybernetics for Test-time Scaling in Video Understanding
arXiv 2025
RecTok: Reconstruction Distillation along Rectified Flow
arXiv 2025
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
arXiv 2025
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
arXiv 2025
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
arXiv 2025
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
ICCV 2025
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer
ICCV 2025
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
arXiv 2025
Training-free Diffusion Acceleration with Bottleneck Sampling
arXiv 2025
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
arXiv 2025
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
CVPR 2025 1
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow
arXiv 2024
RelationBooth: Towards Relation-Aware Customized Object Generation
arXiv 2024
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning
arXiv 2024
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
arXiv 2024
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
ICCV 2023 1
Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition
arXiv 2022
TS2Vec: Towards Universal Representation of Time Series
arXiv 2021
Customizing Graph Neural Networks using Path Reweighting
arXiv 2021
Affiliations
Frequent co-authors
10from 25 papers