Humphrey Shi
- Papers
- 27
Cite
Notes
Only stored in your browser.
Authored papers
27Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light
arXiv 2025
PAI-Bench: A Comprehensive Benchmark For Physical AI
arXiv 2025
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
arXiv 2025
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
arXiv 2025
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
arXiv 2025
Slow-Fast Architecture for Video Multi-Modal Large Language Models
arXiv 2025
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
CVPR 2025 1
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
arXiv 2024
Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment
CVPR 2025 1
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
CVPR 2024 1
UVMap-ID: A Controllable and Personalized UV Map Generative Model
arXiv 2024
Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community
arXiv 2024
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
ICCV 2023 1
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
CVPR 2024 1
Matting Anything
arXiv 2023
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
CVPR 2024 1
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
arXiv 2023
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
CVPR 2024 1
PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor
arXiv 2023
Video Instance Matting
arXiv 2023
Automatic High Resolution Wire Segmentation and Removal
CVPR 2023 1
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
ICCV 2023 1
OneFormer: One Transformer to Rule Universal Image Segmentation
CVPR 2023 1
Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand
arXiv 2022
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
CVPR 2022 1
Escaping the Big Data Paradigm with Compact Transformers
arXiv 2021
CCNet: Criss-Cross Attention for Semantic Segmentation
ccnet-criss-cross-attention-for-semantic-1
Affiliations
Frequent co-authors
10from 27 papers