Pengchuan Zhang
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15SAM 3: Segment Anything with Concepts
arXiv 2025
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
arXiv 2024
Learning Video Context as Interleaved Multimodal Sequences
arXiv 2024
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
arXiv 2023
UniVTG: Towards Unified Video-Language Temporal Grounding
ICCV 2023 1
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
ICCV 2023 1
Revisiting the Role of Language Priors in Vision-Language Models
arXiv 2023
GLIPv2: Unifying Localization and Vision-Language Understanding
arXiv 2022
Parameter-efficient Model Adaptation for Vision Transformers
arXiv 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
coarse-to-fine-vision-language-pre-training-1
RegionCLIP: Region-based Language-Image Pretraining
CVPR 2022 1
Image Scene Graph Generation (SGG) Benchmark
arXiv 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
CVPR 2021 1
Florence: A New Foundation Model for Computer Vision
arXiv 2021
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
ECCV 2020 8
Affiliations
Frequent co-authors
10from 15 papers