Hanoona Rasheed
- Papers
- 10
Cite
Notes
Only stored in your browser.
Authored papers
10Perception Encoder: The best visual embeddings are not at the output of the network
arXiv 2025
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
arXiv 2025
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
arXiv 2025
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
arXiv 2024
PALO: A Polyglot Large Multimodal Model for 5B People
arXiv 2024
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
ICCV 2023 1
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024 1
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
arXiv 2023
Fine-tuned CLIP Models are Efficient Video Learners
CVPR 2023 1
MaPLe: Multi-modal Prompt Learning
maple-multi-modal-prompt-learning-1
Affiliations
Frequent co-authors
10from 10 papers