Muhammad Maaz

Papers: 12

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

12papers

Authored papers

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

arXiv 2025

2025

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

arXiv 2025

2025

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

arXiv 2025

2025

VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

arXiv 2024

2024

PALO: A Polyglot Large Multimodal Model for 5B People

arXiv 2024

2024

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

ICCV 2023 1

2023

GLaMM: Pixel Grounding Large Multimodal Model

CVPR 2024 1

2023

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

arXiv 2023

2023

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

arXiv 2023

2023

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

arXiv 2022

2022

Fine-tuned CLIP Models are Efficient Video Learners

CVPR 2023 1

2022

MaPLe: Multi-modal Prompt Learning

maple-multi-modal-prompt-learning-1

2022

Affiliations

No known affiliations.

Frequent co-authors

from 12 papers

Salman Khan

Hanoona Rasheed

Fahad Shahbaz Khan

Abdelrahman Shaker

Ming-Hsuan Yang

Fahad Khan

Fahad S. Khan

Hisham Cholakkal

Muhammad Uzair Khattak

2 shared papers

Rao M. Anwer

2 shared papers