0

Yue Zhao

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

arXiv 2026

2026

Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling

arXiv 2026

2026

FORTIS: Benchmarking Over-Privilege in Agent Skills

arXiv 2026

2026

Interactive Post-Training for Vision-Language-Action Models

arXiv 2025

2025

One-Minute Video Generation with Test-Time Training

CVPR 2025 1

2025

Language-Image Alignment with Fixed Text Encoders

arXiv 2025

2025

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

arXiv 2025

2025

Spherical Leech Quantization for Visual Tokenization and Generation

arXiv 2025

2025

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

arXiv 2025

2025

Can Multimodal LLMs Perform Time Series Anomaly Detection?

arXiv 2025

2025

TrustLLM: Trustworthiness in Large Language Models

arXiv 2024

2024

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

arXiv 2024

2024

Movie Gen: A Cast of Media Foundation Models

arXiv 2024

2024

VideoPrism: A Foundational Visual Encoder for Video Understanding

arXiv 2024

2024

Image and Video Tokenization with Binary Spherical Quantization

arXiv 2024

2024

GUICourse: From General Vision Language Models to Versatile GUI Agents

arXiv 2024

2024

ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting

arXiv 2024

2024

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

arXiv 2024

2024

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

CVPR 2025 1

2024

Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks

arXiv 2023

2023

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

arXiv 2023

2023

LEAP: Liberate Sparse-view 3D Modeling from Camera Poses

arXiv 2023

2023

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

arXiv 2023

2023

Learning Video Representations from Large Language Models

CVPR 2023 1

2022

Diffusion Models: A Comprehensive Survey of Methods and Applications

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers