Gedas Bertasius
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12SiLVR: A Simple Language-based Video Reasoning Framework
arXiv 2025
DocSLM: A Small Vision-Language Model for Long Multimodal Document Understanding
arXiv 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
CVPR 2025 1
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation
CVPR 2025 1
Video ReCap: Recursive Captioning of Hour-Long Videos
CVPR 2024 1
Siamese Vision Transformers are Scalable Audio-visual Learners
arXiv 2024
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
CVPR 2025 1
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
arXiv 2024
A Simple LLM Framework for Long-Range Video Question-Answering
arXiv 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
CVPR 2024 1
SimpleClick: Interactive Image Segmentation with Simple Vision Transformers
ICCV 2023 1
Is Space-Time Attention All You Need for Video Understanding?
arXiv 2021
Affiliations
Frequent co-authors
10from 12 papers