0

Humphrey Shi

Papers
27

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
27papers

Authored papers

27

Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light

arXiv 2025

2025

PAI-Bench: A Comprehensive Benchmark For Physical AI

arXiv 2025

2025

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

arXiv 2025

2025

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

arXiv 2025

2025

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

arXiv 2025

2025

Slow-Fast Architecture for Video Multi-Modal Large Language Models

arXiv 2025

2025

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

CVPR 2025 1

2024

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

arXiv 2024

2024

Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

CVPR 2025 1

2024

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

CVPR 2024 1

2024

UVMap-ID: A Controllable and Personalized UV Map Generative Model

arXiv 2024

2024

Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community

arXiv 2024

2024

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

ICCV 2023 1

2023

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

CVPR 2024 1

2023

Matting Anything

arXiv 2023

2023

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

CVPR 2024 1

2023

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

arXiv 2023

2023

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

CVPR 2024 1

2023

PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

arXiv 2023

2023

Video Instance Matting

arXiv 2023

2023

Automatic High Resolution Wire Segmentation and Removal

CVPR 2023 1

2023

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

ICCV 2023 1

2022

OneFormer: One Transformer to Rule Universal Image Segmentation

CVPR 2023 1

2022

Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand

arXiv 2022

2022

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

CVPR 2022 1

2021

Escaping the Big Data Paradigm with Compact Transformers

arXiv 2021

2021

CCNet: Criss-Cross Attention for Semantic Segmentation

ccnet-criss-cross-attention-for-semantic-1

2018

Affiliations

No known affiliations.

Frequent co-authors

10

from 27 papers