Gedas Bertasius

Papers: 12

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

12papers

Authored papers

SiLVR: A Simple Language-based Video Reasoning Framework

arXiv 2025

2025

DocSLM: A Small Vision-Language Model for Long Multimodal Document Understanding

arXiv 2025

2025

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

CVPR 2025 1

2025

BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation

CVPR 2025 1

2025

Siamese Vision Transformers are Scalable Audio-visual Learners

arXiv 2024

2024

Video ReCap: Recursive Captioning of Hour-Long Videos

CVPR 2024 1

2024

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

CVPR 2025 1

2024

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

arXiv 2024

2024

A Simple LLM Framework for Long-Range Video Question-Answering

arXiv 2023

2023

LoCoNet: Long-Short Context Network for Active Speaker Detection

CVPR 2024 1

2023

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

ICCV 2023 1

2022

Is Space-Time Attention All You Need for Video Understanding?

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

from 12 papers

Md Mohaiminul Islam

Ce Zhang

Lorenzo Torresani

Mohit Bansal

Taixi Lu

Tanveer Hannan

Thomas Seidl

Tushar Nagarajan

Yan-Bo Lin

Ziyang Wang