Christopher Clark
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
arXiv 2026
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
arXiv 2026
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
arXiv 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
arXiv 2025
2 OLMo 2 Furious
arXiv 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025 1
One Diffusion to Generate Them All
CVPR 2025 1
Holodeck: Language Guided Generation of 3D Embodied AI Environments
CVPR 2024 1
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
arXiv 2023
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
arXiv 2023
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
arXiv 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
ICCV 2023 1
Affiliations
Frequent co-authors
10from 12 papers