0

Cordelia Schmid

Papers
20

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
20papers

Authored papers

20

InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

CVPR 2025 1

2025

Large-scale Pre-training for Grounded Video Caption Generation

ICCV 2025

2025

Streaming Dense Video Captioning

CVPR 2024 1

2024

DataDream: Few-shot Guided Dataset Generation

arXiv 2024

2024

Towards Zero-Shot Multimodal Machine Translation

arXiv 2024

2024

Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy

arXiv 2024

2024

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

CVPR 2023 1

2023

Verbs in Action: Improving verb understanding in video-language models

ICCV 2023 1

2023

POCO: 3D Pose and Shape Estimation with Confidence

arXiv 2023

2023

Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

ICCV 2023 1

2023

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

arXiv 2023

2023

CoVR-2: Automatic Data Construction for Composed Video Retrieval

arXiv 2023

2023

Modular Visual Question Answering via Code Generation

arXiv 2023

2023

Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification

CVPR 2023 1

2023

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

arXiv 2022

2022

TubeDETR: Spatio-Temporal Video Grounding with Transformers

CVPR 2022 1

2022

Learning to Answer Visual Questions from Web Videos

arXiv 2022

2022

WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

ICCV 2023 1

2022

Attention Bottlenecks for Multimodal Fusion

NeurIPS 2021 12

2021

Episodic Transformer for Vision-and-Language Navigation

ICCV 2021 10

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 20 papers