Abdelrahman Shaker
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12WorldCache: Content-Aware Caching for Accelerated Video World Models
arXiv 2026
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
arXiv 2026
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
arXiv 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
arXiv 2025
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
arXiv 2025
VideoMolmo: Spatio-Temporal Grounding Meets Pointing
arXiv 2025
GroupMamba: Efficient Group-Based Visual State Space Model
CVPR 2025 1
PALO: A Polyglot Large Multimodal Model for 5B People
arXiv 2024
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
ICCV 2023 1
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024 1
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models
arXiv 2023
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications
arXiv 2022
Affiliations
Frequent co-authors
10from 12 papers