Cordelia Schmid
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
CVPR 2025 1
Large-scale Pre-training for Grounded Video Caption Generation
ICCV 2025
Streaming Dense Video Captioning
CVPR 2024 1
DataDream: Few-shot Guided Dataset Generation
arXiv 2024
Towards Zero-Shot Multimodal Machine Translation
arXiv 2024
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy
arXiv 2024
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
CVPR 2023 1
Verbs in Action: Improving verb understanding in video-language models
ICCV 2023 1
POCO: 3D Pose and Shape Estimation with Confidence
arXiv 2023
Waffling around for Performance: Visual Classification with Random Words and Broad Concepts
ICCV 2023 1
PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation
arXiv 2023
CoVR-2: Automatic Data Construction for Composed Video Retrieval
arXiv 2023
Modular Visual Question Answering via Code Generation
arXiv 2023
Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification
CVPR 2023 1
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
arXiv 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
CVPR 2022 1
Learning to Answer Visual Questions from Web Videos
arXiv 2022
WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
ICCV 2023 1
Attention Bottlenecks for Multimodal Fusion
NeurIPS 2021 12
Episodic Transformer for Vision-and-Language Navigation
ICCV 2021 10
Affiliations
Frequent co-authors
10from 20 papers