Cite
Notes
Only stored in your browser.
Attribution
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
arXiv 2024
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
from 2 papers
Zhongyu Wei
Binhao Wu
Jiwen Zhang
Mengfei Du
Minghui Qiu
Ruipu Luo
Xuanjing Huang