Jitesh Jain
- Papers
- 8
Cite
Notes
Only stored in your browser.
8papers
Authored papers
8Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
arXiv 2026
Slow-Fast Architecture for Video Multi-Modal Large Language Models
arXiv 2025
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
arXiv 2025
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
arXiv 2024
Matting Anything
arXiv 2023
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
CVPR 2024 1
Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand
arXiv 2022
OneFormer: One Transformer to Rule Universal Image Segmentation
CVPR 2023 1
Affiliations
No known affiliations.
Frequent co-authors
10from 8 papers