Cite
Notes
Only stored in your browser.
Attribution
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
arXiv 2025
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities
ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter
CoRL2024
from 3 papers
Haojie Huang
Renrui Zhang
Xupeng Zhu
Ziyu Guo
Dongzhi Jiang
Haibo Zhao
Hongsheng Li
Jiayi Zhang
Jiazheng Liu
Kai Cheng