Yang Zhao
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25PresentAgent-2: Towards Generalist Multimodal Presentation Agents
arXiv 2026
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
arXiv 2025
MediAug: Exploring Visual Augmentation in Medical Imaging
arXiv 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
arXiv 2025
PresentAgent: Multimodal Agent for Presentation Video Generation
arXiv 2025
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
arXiv 2025
SSS: Semi-Supervised SAM-2 with Efficient Prompting for Medical Imaging Segmentation
arXiv 2025
Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs
arXiv 2025
ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer
arXiv 2025
DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps
arXiv 2025
MedConv: Convolutions Beat Transformers on Long-Tailed Bone Density Prediction
arXiv 2025
VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery
arXiv 2025
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
CVPR 2025 1
PathoHR: Breast Cancer Survival Prediction on High-Resolution Pathological Images
arXiv 2025
PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
arXiv 2025
Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search
arXiv 2025
Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
arXiv 2024
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
arXiv 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
arXiv 2024
MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection
arXiv 2024
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
arXiv 2024
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models
arXiv 2024
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
arXiv 2023
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
CVPR 2024 1
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
ICCV 2023 1
Affiliations
Frequent co-authors
10from 25 papers