Josef Sivic
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation
arXiv 2025
Large-scale Pre-training for Grounded Video Caption Generation
ICCV 2025
MassSpecGym: A benchmark for the discovery and identification of molecules
arXiv 2024
Learning to engineer protein flexibility
arXiv 2024
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
CVPR 2023 1
Meta-Personalizing Vision-Language Models to Find Named Instances in Video
meta-personalizing-vision-language-models-to
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
arXiv 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
CVPR 2022 1
Learning to Answer Visual Questions from Web Videos
arXiv 2022
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation
arXiv 2022
Cross-task weakly supervised learning from instructional videos
cross-task-weakly-supervised-learning-from-1
D2-Net: A Trainable CNN for Joint Detection and Description of Local Features
arXiv 2019
Finding Moments in Video Collections Using Natural Language
arXiv 2019
Affiliations
Frequent co-authors
10from 13 papers