0

Yang Zhao

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

arXiv 2026

2026

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

arXiv 2025

2025

MediAug: Exploring Visual Augmentation in Medical Imaging

arXiv 2025

2025

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

arXiv 2025

2025

PresentAgent: Multimodal Agent for Presentation Video Generation

arXiv 2025

2025

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

arXiv 2025

2025

SSS: Semi-Supervised SAM-2 with Efficient Prompting for Medical Imaging Segmentation

arXiv 2025

2025

Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs

arXiv 2025

2025

ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer

arXiv 2025

2025

DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps

arXiv 2025

2025

MedConv: Convolutions Beat Transformers on Long-Tailed Bone Density Prediction

arXiv 2025

2025

VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery

arXiv 2025

2025

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

CVPR 2025 1

2025

PathoHR: Breast Cancer Survival Prediction on High-Resolution Pathological Images

arXiv 2025

2025

PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection

arXiv 2025

2025

Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search

arXiv 2025

2025

Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion

arXiv 2024

2024

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

arXiv 2024

2024

Image Understanding Makes for A Good Tokenizer for Image Generation

arXiv 2024

2024

MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection

arXiv 2024

2024

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

arXiv 2024

2024

StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models

arXiv 2024

2024

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers

arXiv 2023

2023

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

CVPR 2024 1

2023

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding

ICCV 2023 1

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers