Muhammad Maaz
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
arXiv 2025
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
arXiv 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
arXiv 2025
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
arXiv 2024
PALO: A Polyglot Large Multimodal Model for 5B People
arXiv 2024
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
ICCV 2023 1
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024 1
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
arXiv 2023
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
arXiv 2023
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications
arXiv 2022
Fine-tuned CLIP Models are Efficient Video Learners
CVPR 2023 1
MaPLe: Multi-modal Prompt Learning
maple-multi-modal-prompt-learning-1
Affiliations
Frequent co-authors
10from 12 papers
Salman Khan
Hanoona Rasheed
Fahad Shahbaz Khan
Abdelrahman Shaker
Ming-Hsuan Yang
Fahad Khan
Fahad S. Khan
Hisham Cholakkal
Muhammad Uzair Khattak
Rao M. Anwer