Mu Cai

Papers: 10

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

10papers

Authored papers

MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models

arXiv 2026

2026

Magma: A Foundation Model for Multimodal AI Agents

CVPR 2025 1

2025

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

arXiv 2025

2025

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

arXiv 2024

2024

Matryoshka Multimodal Models

arXiv 2024

2024

Yo'LLaVA: Your Personalized Language and Vision Assistant

arXiv 2024

2024

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

arXiv 2024

2024

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

arXiv 2024

2024

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

arXiv 2024

2024

Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos

vinoground-scrutinizing-lmms-over-dense

2024

Affiliations

No known affiliations.

Frequent co-authors

from 10 papers

Yong Jae Lee

Jianrui Zhang

Bocheng Zou

Jianfeng Gao

Jianwei Yang

Reuben Tan

Yuzhang Shang

Baolin Peng

Can Qin

Cristina Mata