Mu Cai
- Papers
- 10
Cite
Notes
Only stored in your browser.
Authored papers
10MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models
arXiv 2026
Magma: A Foundation Model for Multimodal AI Agents
CVPR 2025 1
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
arXiv 2025
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
arXiv 2024
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
arXiv 2024
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
vinoground-scrutinizing-lmms-over-dense
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
arXiv 2024
Yo'LLaVA: Your Personalized Language and Vision Assistant
arXiv 2024
Matryoshka Multimodal Models
arXiv 2024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
arXiv 2024
Affiliations
Frequent co-authors
10from 10 papers