Humphrey Shi

PAI-Bench: A Comprehensive Benchmark For Physical AI

arXiv 2025

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

arXiv 2025

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

arXiv 2025

Slow-Fast Architecture for Video Multi-Modal Large Language Models

arXiv 2025

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

arXiv 2025

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

arXiv 2024

Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

CVPR 2025 1

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

CVPR 2025 1

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

CVPR 2024 1

UVMap-ID: A Controllable and Personalized UV Map Generative Model

arXiv 2024

Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community

arXiv 2024

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

ICCV 2023 1

PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

arXiv 2023

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

CVPR 2024 1

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

arXiv 2023

Video Instance Matting

arXiv 2023

Automatic High Resolution Wire Segmentation and Removal

CVPR 2023 1

Matting Anything

arXiv 2023

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

CVPR 2024 1

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

CVPR 2024 1