Rogerio Feris
- Papers
- 18
Cite
Notes
Only stored in your browser.
Authored papers
18Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
arXiv 2025
TTRV: Test-Time Reinforcement Learning for Vision Language Models
arXiv 2025
M+: Extending MemoryLLM with Scalable Long-Term Memory
arXiv 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
arXiv 2025
PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
arXiv 2025
Teaching VLMs to Localize Specific Objects from In-context Examples
ICCV 2025
DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners
arXiv 2024
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
arXiv 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
arXiv 2024
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
ICCV 2023 1
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
ICCV 2023 1
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
arXiv 2023
CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning
CVPR 2023 1
Teaching Structured Vision&Language Concepts to Vision&Language Models
arXiv 2022
Procedural Image Programs for Representation Learning
arXiv 2022
FETA: Towards Specializing Foundation Models for Expert Task Applications
arXiv 2022
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
ICCV 2021 10
Depthwise Convolution is All You Need for Learning Multiple Visual Domains
arXiv 2019
Affiliations
Frequent co-authors
10from 18 papers