Andrew Zisserman

Scaling 4D Representations

arXiv 2024

Moving Object Segmentation: All You Need Is SAM (and Flow)

arXiv 2024

Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark

arXiv 2024

TIM: A Time Interval Machine for Audio-Visual Action Recognition

CVPR 2024 1

The Sound of Water: Inferring Physical Properties from Pouring Liquids

arXiv 2024

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

CVPR 2024 1

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

ICCV 2023 1

Verbs in Action: Improving verb understanding in video-language models

ICCV 2023 1

GestSync: Determining who is speaking without a talking head

arXiv 2023

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

ICCV 2023 1

AutoAD: Movie Description in Context

CVPR 2023 1

A CLIP-Hitchhiker's Guide to Long Video Retrieval

arXiv 2022

2022

CounTR: Transformer-based Generalised Visual Counting

arXiv 2022

2022

Perceiver IO: A General Architecture for Structured Inputs & Outputs

perceiver-io-a-general-architecture-for-1

Label, Verify, Correct: A Simple Few Shot Object Detection Method

CVPR 2022 1

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

ICCV 2021 10

TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval

ICCV 2021 10

Open-Set Recognition: a Good Closed-Set Classifier is All You Need?

open-set-recognition-a-good-closed-set

PASS: An ImageNet replacement for self-supervised pretraining without humans

NeurIPS Workshop ImageNet_PPF 2021 12