Zixian Ma
- Papers
- 11
Cite
Notes
Only stored in your browser.
Authored papers
11Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
arXiv 2026
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
arXiv 2026
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?
arXiv 2026
Reinforced Visual Perception with Tools
arXiv 2025
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
arXiv 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
arXiv 2025
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
arXiv 2025
TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action
arXiv 2024
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
arXiv 2024
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
arXiv 2024
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality
sugarcrepe-fixing-hackable-benchmarks-for
Affiliations
Frequent co-authors
10from 11 papers